RobokopU24 / Feedback

Feedback on the ROBOKOP project
https://robokop.renci.org
0 stars 0 forks source link

csv dump of answer sets #172

Open karafecho opened 1 year ago

karafecho commented 1 year ago

This issue is to suggest that we add a feature to allow users to download csv dumps of answer sets and individual results, in addition to JSON dumps.

karafecho commented 2 weeks ago

Update: More than one external user has requested a csv or txt dump of results. I strongly suggest that we prioritize this request.

karafecho commented 2 weeks ago

Note that one reason we have not offered non-JSON formats is that the conversion to csv or txt, for example, will result in a very messy, non-readable file due to the many nested attributes and node/edge properties. We may want to simplify the output by providing, for example, subject + predicate + object + primary/aggregator knowledge source + publications + statements. Of course, we will need to confirm that a simplified output file is acceptable to users.

EvanDietzMorris commented 2 weeks ago

To add to the messiness point, it's not just attributes which are actually nested which make it messy, TRAPI results in general are made up of many dictionaries. Even primary and aggregator knowledge sources are represented in a dictionary format that would be complicated to translate into csv in a meaningful way. We usually have very simple examples but the TRAPI spec allows for complex chains of provenance. In the latest version of Plater (not yet deployed) we will also have support/aux graphs representing subclass edges, which are referenced by other results in a way that is easy to read in TRAPI/json but would be difficult to represent in CSV.

In short, it'd be easy to spit out very simplified versions of results, but there is A LOT of stuff in TRAPI that is easier to read and understand in json, and would require quite a bit of work and design to transform into CSV. The compromise would be to include json strings as values inside the csv, which somewhat defeats the point.

karafecho commented 2 weeks ago

FWIW, I asked one of the external users who requested a csv or tsv dump if the minimal fields I suggested above would be sufficient to support this person's needs. Will update this ticket after receiving a response.

karafecho commented 2 weeks ago

Additional feedback from external user:

"All I need is the subject, predicate, object and publication. The other fields would be nice. I am not sure if you have the time, but maybe you could add a ‘check box’ for the attributes to include."