alephdata / aleph

Search and browse documents and data; find the people and companies you look for.
http://docs.aleph.occrp.org
MIT License
2.02k stars 269 forks source link

Allow users to export results from Entity References Mode #1860

Open kjacks opened 3 years ago

kjacks commented 3 years ago

Enable the export button in the UI for query results in Entity References Mode. For example, allowing users to export all Directorships that a Person is involved with.

kjacks commented 3 years ago

@sunu the API returns an links.export for these queries, but would the exported csv be meaningful as it's currently formulated? i.e. would it export a list of Intervals with referred ids, without populating the referred entities themselves?

Rosencrantz commented 3 years ago

My question here would be why might a user want to export this data, and as a followup, what would need to happen for them to be able to complete the work inside of Aleph

sunu commented 3 years ago

Didn't notice we already have an issue for this. This is exactly the problem Dhruv was trying to solve last week and had to write a script to get it done. We should of course tweak the export logic to populate the referred entities to make the export more useful. May be we can talk to @brrttwrks and come up with a more useful export structure.

@Rosencrantz IMO, the main use case is to take this data into Excel since most journalist are more comfortable with that. Also, currently, there is no way in Aleph to collect this data from several target entities into one list to do any kind of analysis. Once we add the ability to reference entities from other datasets in a list, it can be a bit more convenient to do this kind of work inside Aleph itself.

brrttwrks commented 2 years ago

Related: people want to export lists as CSVs or XLSX workbooks. The resultant export should, of course have the Aleph ID references when Intervals are concerned, but should also have the names in human readable way. In our case, when FtM represents many-to-many relationships, this presents a problem. Users want this to be easy, but spreadsheets are 2-dimensional. One table for each entity type with reference ids (foreign ids) is easiest, but might be least useful. Pre-joining tables might be useful, but which ones? I am leaning to the former to save our sanity. I take that back - I think the Thing tables are OK, but the Interval tables are just not useful for humans. We should be doing joins to be able to give people tables with the actual names instead of a table of foreign keys.