IFRCGo / go-api

MIT License
14 stars 6 forks source link

Simplify CSV Output for Event end-point #708

Closed batpad closed 4 years ago

batpad commented 4 years ago

Currently, the emergency API end-point, when returned as CSV, generates separate columns for every field report attached to the emergency. This is creating an inordinate and unmanageable amount of columns.

We should simplify the CSV output for the Emergency API so that it only returns summarized information for the field reports - perhaps just two fields:

These changes should only happen on the CSV output, and we should leave the JSON output the same (we do this similarly to how we handle the Field Report serializer conditionally based on the output format).

@nanometrenat please let me know if this ticket is duplicated somewhere, could not find an existing ticket.

cc @GregoryHorvath is this something you might be able to take on?

nanometrenat commented 4 years ago

Thanks @batpad that sounds great - I hadn't realised that it was simple to treat csv output differently to json - excellent solution. Backstory is in this comment on another ticket: https://github.com/IFRCGo/go-frontend/issues/1161#issuecomment-626391193

Above 2 fields sound good. Re the first one, if just as easy to calculate, then Number of Field Reports would be even better (then we know if it's zero/null then there are no Field Reports attached, otherwise can display a number). But if that is at all complicated or additional processing then we can do the processing by counting up the ids in the csv list.

The other thing that would be amazing (though would be an additional requirement, and may not be feasible, so not critical if too hard/risky), would it be possible to add an additional new field Has COVID-19 Field Report ? Would be really helpful for our dashboards but we can always do a join if not poss. Many thanks

nanometrenat commented 4 years ago

Interestingly, I did not realise that the "mini" event endpoint was even an option! Interested to know how/where it is used: https://goadmin.ifrc.org/docs/#event-mini_events

nanometrenat commented 4 years ago

Following Monday's meeting, standing by for a proposed new schema for review - thanks!

kamicut commented 4 years ago

So for the Events API, the included data that are larger objects are the following:

For countries, we can use the database id as well as an iso code or country name for legibility. I'm not too familiar with the structure of the Appeals and Field Reports but they each have a separate aid and rid that is different from the id.

Here's a sample output of the CSV with the current code:

appeals.id auto_generated countries.id countries.name created_at disaster_start_date dtype.id dtype.name dtype.summary field_reports.id glide id ifrc_severity_level is_featured is_featured_region name num_affected parent_event slug summary updated_at
TRUE 14, 285 Afghanistan, Africa Region 2020-05-26T04:22:35.314124Z 2020-05-13T00:00:00Z 15 Fire 11806 3559 Green FALSE FALSE a 2020-06-05T02:52:01.017850Z
TRUE 1 Ukraine 2019-05-21T09:50:01.315999Z 2019-05-21T09:50:01.192943Z 12 Flood 11805 3558 Green FALSE FALSE Flooding in Western Ukraine 2019-05-21T09:50:01.316015Z
TRUE 71 Georgia 2019-05-17T15:01:16.893454Z 2019-05-17T15:01:16.790122Z 27 Pluvial/Flash Flood 11804 3557 Green FALSE FALSE Floods in Kakheti Region, Georgia 2019-05-17T15:01:16.893472Z
TRUE 62 El Salvador 2019-05-17T13:44:57.703203Z 2019-05-17T13:44:57.468765Z 2 Earthquake 11803 3556 Green FALSE FALSE SISMO magnitud 5.9 con replicas en El Salvador 2019-05-17T13:44:57.703218Z
TRUE 112 Mali 2019-05-16T21:30:46.512665Z 2019-05-16T21:30:46.228676Z 12 Flood 11802 3555 Green FALSE FALSE Ce jeudi 16 mai 2019, une grande inondation à Bamako: bilan provisoire: 14 morts . 2019-05-16T21:30:46.512679Z
TRUE 222 Israel 2019-05-15T17:59:40.011132Z 2019-05-15T17:59:39.947733Z 2 Earthquake 11798 3551 Green FALSE FALSE On May 15th at 19:53 an earthquake was felt in Israel, no reports of damage or casualties. 2019-05-15T17:59:40.011147Z
TRUE 194 Mongolia 2019-05-15T07:34:27.182236Z 2019-05-15T07:34:26Z 14 Cold Wave 11797 3550 Green FALSE FALSE 6 dead in an active snowstorm in Mongolia 500 2019-05-16T15:33:03.391751Z
TRUE 76 Guatemala 2019-05-14T23:24:54.694277Z 2019-05-14T23:24:54.179557Z 20 Drought 11796 3549 Green FALSE FALSE Sequía en Guatemala 2019-05-14T23:24:54.694294Z
batpad commented 4 years ago

@kamicut this is looking good to me! @nanometrenat could you look at the CSV sample above, and happy to have a follow-up chat any time.

@kamicut I see that in the CSV posted, there is only always one field report, country, etc. - is it possible to output with an example where there are multiple ids / names? I assume these values will be comma-separated inside a single cell?

GergiH commented 4 years ago

@kamicut :+1: I like the latest approach you went in the latest code, so I'm following that for the Tableau WDC. I'd say we should definately include summary for the fieldreports and name for the appeals.

cc @batpad

LukeCaley commented 4 years ago

@nanometrenat could you review and comment on the schema proposed please?

batpad commented 4 years ago

The new CSV schema for Events is on staging - it will no longer return multiple columns for related items. Would be great to get a review, and also some discussion on how we inform users, etc.

LukeCaley commented 4 years ago

@nanometrenat would you have time to UAT this please? If possible also to work with @JonathanGarro on documenting and communicating the changes to users.

batpad commented 4 years ago

@LukeCaley - this ticket and @kamicut's example CSV output above describes the changes to the Emergencies CSV end-point. We have made the output always a fixed amount of columns, and returned things like country ids as comma separated values. This should make general CSV processing much more convenient, and will avoid problems we were having with a massive amount of columns in the output CSV.

This will also improve performance of the CSV end-point. This is however, a breaking API change and anyone depending on the Emergencies CSV export will need to account for this change.

Happy to work with @JonathanGarro on figuring out how best to communicate these changes to users.