Closed bwhiteman closed 8 years ago
We should include the domain Id.
I'd rename "topLevelDomains" to "rootUrls". We'll need to do a search for unique ones which should be easy enough.
I understand SearchTerms, Domain Entity Types, and Domain Entities. What are commonEntities?
We'll need to add the url relevance to the urls array. I do think we should include ids on each appropriate item (overall domain,urls, domain entities) in case we want to retrieve results from a third party who uses this export
rootURLS is fine. commonEntities would be the most extracted entities. I'm not sure how useful this will be but we can see.
I agree everything should probably be an object array of {"id": "1223", "value": "value"}
since we don't currently have commonEntites, can we just include ExtractedEntities (with id, occurences, url extracted from, etc)?
That's what I meant, just pick an arbitrary number like the top 20 extracted entities with the highest counts.
Looks like it's working as described. JSON exports and has all the sections with appropriate content. Couldn't get domain items manually added because that panel stuff was in a different branch, but the section was there in the JSON.
@bmcdougald @michaelsframe When I try this on a larger domain, I only get a small amount of the domain, with no logging.
Fixed in 100/90
We need to be able to export a domain for use by the crawling teams. The domain should be the aggregate of all of the trails within a domain (At some point we may want to be able to choose specific trails.)
The format should be similar to