BiologicalRecordsCentre / PoMS

repository for Pollinator Monitoring Scheme website
0 stars 0 forks source link

Data downloads for individual users #33

Closed CCarvell closed 1 month ago

CCarvell commented 3 years ago

To confirm that individual users will be able to access their own FIT Count data to download in csv format. This would allow groups of individuals, if they wish, to come together to pool their counts if part of a project where they want to view a set of counts collectively (e.g. to compare with national averages or to compare between their own sites or years). Examples who may want to use the app/ FIT Count to do this are: Earthwatch Buglife B lines Pollinating London Together Farmer Clusters/ Estates National Trust

DavidRoy commented 3 years ago

These are two different requirements.
For user download - this is requirement for the initial website development

I suggest the second (organisational/group) requirement is a separate (much more complex) issue

CCarvell commented 3 years ago

David, I am not yet talking about the organisational/group requirement. The above relates to one requirement for individual users. What I am suggesting is that if a group of users want to pool their own counts (ie. download them as individuals when logged into their own accounts, then compile a set of counts offline, in their own time and without additional input from PoMS) then the system would not prevent them from doing that. Of course, there is room for error, and I can't imagine many groups actually seeing it through, but that is not something for us to worry about. I'm getting lots of queries about this (ie. using FIT Counts as a tool to monitor change), so trying to set out what is going to be possible and manage expectations.

CCarvell commented 3 years ago

@DavidRoy the RSPB/ Buglife project at Hope Farm is another example - if submitting counts via the app, and wanting to provide feedback to the farmer/s, they would have to download separately each relevant count and summarise the data themselves.

kitenetter commented 3 years ago

@andrewvanbreda as per above, we need to enable users to download the data for the public FIT Counts they have contributed. I have a nasty feeling that wasn't in our original specification documentation.

I don't think we need to enable users to download data from the 1km surveys, certainly not in the intial version of the website.

In the specification we do have "A1.5 Data download function" to allow for a more general download of data, but that is not needed for the intial website launch.

andrewvanbreda commented 3 years ago

@CCarvell @kitenetter @DavidRoy Sorry I missed this issue and raised another one. I already enabled grid Download This Report option in bottom right of the My Records page grid. In addition to this, I have code running on my machine that allows a user to draw onto the map and filter the results to that area.

That can work on FIT count map page too and doesn't require the user to have to open the usual Indicia filtering panel, they just click icon and draw. I will get that put onto the site very soon, just need reports pulled onto site.

kitenetter commented 3 years ago

This needs some discussion. At the moment, the grid download from "My records" outputs the following columns:

What I would ideally like us to output is this:

Note that some sort of pivot table output is needed to report on the insect group counts in this way.

Need to decide what is possible to do for launch, and how best to take this forward.

andrewvanbreda commented 3 years ago

@kitenetter OK well I will check spec, but easy download Indicia page is driven by a report, so could just drive this from a report. Prob not too difficult, although am not sure whether that page can support the map draw....I don't think so.

andrewvanbreda commented 3 years ago

@kitenetter Actually another thought, I think i have seen in the past that the grid download option downloads hidden columns (so extra columns not shown on grid that are deliberately hidden). If so it would be possible from the same download option I already setup, I will take a look

andrewvanbreda commented 3 years ago

@kitenetter I checked and that is indeed the case. Providing I can code the report grid so it isn't too slow, I think these extra columns could be added to the existing download I setup.....although there is a lot of LEFT Joining involved. I can code that, and if it is too slow for the grid, we can move the same report to drive an Easy Download page

DavidRoy commented 3 years ago

Or extract the attributes from the json field in the cache tables? To negate the need for many joins?

andrewvanbreda commented 3 years ago

@DavidRoy Yes that is true, I keep forgetting we have them there now for sample/occurrences, I can do that I think. I will take a look.

kitenetter commented 3 years ago

@andrewvanbreda as discussed, here is the SQL that we've been using to download PoMS data via pgAdmin. What I haven't been able to do in pgAdmin is to incorporate the 'pivot' approach so that the insect group occurrences can be included alongside the sample attributes.

Sample attributes:

SELECT s.id AS "sampleid", (case when snf.privacy_precision is null then s.location_name else 'Sensitive. Lat long is approximate.' end) AS "locationname", (case snf.entered_sref_system when '4326' then 'WGS84' when '27700' then 'OSGB36' else upper(snf.entered_sref_system) end) AS "sampleprojection", snf.public_entered_sref AS "enteredspatialref", (get_sref_precision(snf.public_entered_sref, snf.entered_sref_system, snf.attr_sref_precision)) AS "originalprecision", (SELECT string_agg(vc.code, '; ') FROM locations vc WHERE vc.location_type_id=(SELECT id FROM cache_termlists_terms WHERE termlist_title='Location types' AND term='Vice County') AND vc.name NOT LIKE '%+%' AND vc.id = ANY(s.location_ids)) AS "vcnumber", (SELECT string_agg(vc.name, '; ') FROM locations vc WHERE vc.location_type_id=(SELECT id FROM cache_termlists_terms WHERE termlist_title='Location types' AND term='Vice County') AND vc.name NOT LIKE '%+%' AND vc.id = ANY(s.location_ids)) AS "vicecounty", (SELECT string_agg(ctry.name, '; ') FROM locations ctry WHERE ctry.location_type_id=(SELECT id FROM cache_termlists_terms WHERE termlist_title='Location types' AND term='Country') AND ctry.name NOT LIKE '%+%' AND ctry.id = ANY(s.location_ids)) AS "country", (extract(year from s.date_start)) AS "year", s.date_start AS "datefrom", s.date_end AS "dateto", s.date_type AS "datetype", rtrim(snf.recorders, ', ') AS recorder, snf.comment AS "samplecomment", s.created_on AS "dateadded", s.updated_on AS "datelastedited", (SELECT snf.attrs_json ->> '1048' AS "habitat"), (SELECT snf.attrs_json ->> '1049' AS "otherhabitat"), (SELECT snf.attrs_json ->> '1050' AS "targetflower"), (SELECT snf.attrs_json ->> '1051' AS "otherflower"), (SELECT snf.attrs_json ->> '1052' AS "coverwithinquadrat"), (SELECT snf.attrs_json ->> '1053' AS "totalfloralunits"), (SELECT snf.attrs_json ->> '1054' AS "floralunit"), (SELECT snf.attrs_json ->> '1055' AS "quadratcontext"), (SELECT snf.attrs_json ->> '1056' AS "countstarttime"), (SELECT snf.attrs_json ->> '1057' AS "cloudcover"), (SELECT snf.attrs_json ->> '1058' AS "wind"), (SELECT snf.attrs_json ->> '1061' AS "quadratsunshine"), ti.total_insects AS "totalinsects" FROM cache_samples_functional s JOIN cache_samples_nonfunctional snf ON s.id = snf.id LEFT JOIN (SELECT s.id AS "sample_id", sum(cast(onf.attr_sex_stage_count AS integer)) AS "total_insects" FROM cache_samples_functional s JOIN cache_occurrences_functional o ON s.id = o.sample_id AND o.training = FALSE JOIN cache_occurrences_nonfunctional onf ON o.id = onf.id JOIN cache_taxa_taxon_lists cttl on cttl.id = o.taxa_taxon_list_id AND cttl.taxon_group LIKE 'Pollinator%' WHERE s.survey_id = 467 GROUP BY s.id) ti ON s.id = ti.sample_id WHERE s.survey_id = 467 ORDER BY s.id DESC;

Insect group occurrences:

SELECT s.id AS "sampleid", o.id AS "occurrenceid", cttl.preferred_taxon AS taxon, cttl.taxon_group AS "taxongroup", (SELECT onf.attrs_json ->> '666' AS "insectcount"), (case when onf.output_sref IS NOT NULL then onf.output_sref ELSE snf.public_entered_sref end) AS "spatialref", (case when onf.output_sref IS NOT NULL then (case onf.output_sref_system when '4326' then 'WGS84' when '27700' then 'OSGB36' else upper(onf.output_sref_system) end) else (case snf.entered_sref_system when '4326' then 'WGS84' when '27700' then 'OSGB36' else upper(snf.entered_sref_system) end) end) AS "projection", snf.public_entered_sref AS "enteredspatialref", (get_sref_precision(snf.public_entered_sref, snf.entered_sref_system, snf.attr_sref_precision)) AS "originalprecision", (SELECT string_agg(vc.code, '; ') FROM locations vc WHERE vc.location_type_id=(SELECT id FROM cache_termlists_terms WHERE termlist_title='Location types' AND term='Vice County') AND vc.name NOT LIKE '%+%' AND vc.id = ANY(s.location_ids)) AS "vcnumber", (SELECT string_agg(vc.name, '; ') FROM locations vc WHERE vc.location_type_id=(SELECT id FROM cache_termlists_terms WHERE termlist_title='Location types' AND term='Vice County') AND vc.name NOT LIKE '%+%' AND vc.id = ANY(s.location_ids)) AS "vicecounty", (SELECT string_agg(ctry.name, '; ') FROM locations ctry WHERE ctry.location_type_id=(SELECT id FROM cache_termlists_terms WHERE termlist_title='Location types' AND term='Country') AND ctry.name NOT LIKE '%+%' AND ctry.id = ANY(s.location_ids)) AS "country", (extract(year from s.date_start)) AS "year", s.date_start AS "datefrom", s.date_end AS "dateto", s.date_type AS "datetype", (substring(st_astext(st_transform(st_centroid(o.public_geom), 4326)) from E'POINT\\(.+ (.+)\\)')) AS latitude, (substring(st_astext(st_transform(st_centroid(o.public_geom), 4326)) from E'POINT\\((.+) ')) AS longitude FROM cache_samples_functional s JOIN cache_samples_nonfunctional snf ON s.id = snf.id LEFT JOIN cache_occurrences_functional o ON s.id = o.sample_id AND o.training = FALSE LEFT JOIN cache_occurrences_nonfunctional onf ON o.id = onf.id LEFT JOIN cache_taxa_taxon_lists cttl on cttl.id = o.taxa_taxon_list_id WHERE s.survey_id = 467 ORDER BY s.id DESC, o.id DESC;

kitenetter commented 3 years ago

@andrewvanbreda As discussed, users should be able to download the full set of attrbutes (as per the bulleted list above) for their own data, but we still need to confirm what the output should be for users downloading other people's data via the polygon select. Claire and I are working on that and we will confirm it as soon as we can next week.

andrewvanbreda commented 3 years ago

@BirenRathod I have some new commits for various reports Could you pull the reports in projects/PoMS/ from indicia-reports onto live please.

Thanks

BirenRathod commented 3 years ago

@andrewvanbreda I have pulled all reports now.

andrewvanbreda commented 3 years ago

@BirenRathod Great thanks, we take a look right now

andrewvanbreda commented 3 years ago

@BirenRathod Hi Biren, are you able to pull those PoMS reports again please as further changes have been made. Thanks

BirenRathod commented 3 years ago

@andrewvanbreda they have pulled on live now.

andrewvanbreda commented 3 years ago

@BirenRathod Brilliant, thanks Biren

andrewvanbreda commented 3 years ago

@BirenRathod Hi Biren, Would you be able to pull these reports again thanks. I have now added language support for the termlists_terms (although not columns yet). Thanks

BirenRathod commented 3 years ago

@andrewvanbreda I have pulled those reports on live now.

andrewvanbreda commented 3 years ago

@BirenRathod Thanks Biren :)

kitenetter commented 3 years ago

The download for users' own data looks like it is working well, thanks.

In the thread above we said "we still need to confirm what the output should be for users downloading other people's data via the polygon select" - I'm going to put that in a new issue #93

kitenetter commented 3 years ago

The current users' download includes a column for "Pan-trap number" - this is not needed for the FIT Counts and can be dropped.

The column "Photo of target flower" can be dropped (the file name it contains will not be meaningful to users).

We need to add a column for "Other flower name" (this is used whenever "Other flower" is chosen for the target flower).

If possible we should translate lat/long geolocations (from the app) into grid refs, providing a "Spatial ref output" as well as a "Spatial ref entered" column, as is done in iRecord. But maybe that will have to go on the list for future development?

andrewvanbreda commented 3 years ago

@BirenRathod Are you able to pull these reports again, lot's of changes. Some reports are renamed, so just pull the whole PoMS folder. Thanks

BirenRathod commented 3 years ago

@andrewvanbreda I have now pulled those reports on live warehouse.

andrewvanbreda commented 3 years ago

@BirenRathod Thanks Biren, will check in a bit.

andrewvanbreda commented 3 years ago

@BirenRathod Are you able to pull the PoMS folder again please, I made quite a few changes yesterday although there will be some more later) Cheers

BirenRathod commented 3 years ago

@andrewvanbreda I have pulled those reports this morning as I was doing same for other reports. It is already there.

andrewvanbreda commented 3 years ago

@BirenRathod Can I get you to pull the PoMS reports folder again please. Lots of new versions of the reports hopefully works better with multi-lingual. Cheers.

andrewvanbreda commented 3 years ago

@BirenRathod Would you be able to pull the PoMS report folder when you get a chance, thanks, as I need to check the reports on the site. Cheers

BirenRathod commented 3 years ago

@andrewvanbreda Done.

andrewvanbreda commented 3 years ago

@BirenRathod Thanks Biren, Ok that is appearing, darn, not picking up the welsh data. Will get back to you

andrewvanbreda commented 3 years ago

@BirenRathod Hi Biren, Would you be able to send me these two results so I can try to see what is going on. Cheers

select * from indicia.sample_attribute_values where sample_attribute_id = 1050 and sample_id = 13803554 and deleted=false;

select * from indicia.cache_termlists_terms where id in (select int_value from indicia.sample_attribute_values where sample_attribute_id = 1050 and sample_id = 13803554 and deleted=false);

BirenRathod commented 3 years ago

@andrewvanbreda Here are the results. POMS.zip

andrewvanbreda commented 3 years ago

@BirenRathod Aha, thanks I can see the issue. Will need to change the way the reports work a bit.

andrewvanbreda commented 3 years ago

@BirenRathod Hi Biren, Hopefully these will be fixed. Can you pull the folder again whenever you get a chance. Cheers

BirenRathod commented 3 years ago

@andrewvanbreda Done.

andrewvanbreda commented 3 years ago

@BirenRathod Thanks for quick response. Ok not working still, think will have to look later at this, just trying to do this quickly. Obviously making a mistake somewhere.

andrewvanbreda commented 3 years ago

@BirenRathod There was a silly reoccurring bug in an area I wasn't expecting. Have seen working on my machine. Can you pull the PoMS reports when you get a second please as have fixed now, thanks.

BirenRathod commented 3 years ago

@andrewvanbreda All reports have now been pulled down.

andrewvanbreda commented 3 years ago

@BirenRathod This still isn't working, but have tested again on my machine and it is fine, and there is no apparent reason for this situation. Could you run this sql please to help me debug. Thanks (sorry for bugging you so much)

SELECT s.id as sample_id,csnf.id as sample_nonfunct_id,cof.id as cof_id, conf.id as conf_id, 
sav_cover.id as save_id,ctt_cover.id as main_term_id,ctt_cover.term as main_term,ctt_cover_lang.id as lang_term_id,ctt_cover_lang.term as lan_term
  FROM indicia.cache_samples_functional s
  LEFT JOIN indicia.cache_samples_nonfunctional csnf on csnf.id = s.id
  LEFT JOIN indicia.cache_occurrences_functional cof on cof.sample_id = s.id
  LEFT JOIN indicia.cache_occurrences_nonfunctional conf on conf.id = cof.id

  LEFT JOIN indicia.sample_attribute_values sav_cover on sav_cover.sample_id = csnf.id 
      AND sav_cover.sample_attribute_id = 1048
      AND sav_cover.deleted = false
  LEFT JOIN indicia.cache_termlists_terms ctt_cover on ctt_cover.id = sav_cover.int_value
  LEFT JOIN indicia.cache_termlists_terms ctt_cover_lang on ctt_cover_lang.meaning_id = ctt_cover.meaning_id AND ctt_cover_lang.language_iso = 'cym' where s.id = 13803554;
BirenRathod commented 3 years ago

@andrewvanbreda Here is the result. data-1621432842867_CSF.zip

andrewvanbreda commented 3 years ago

@BirenRathod Darn, this shows the report should be working. The PoMS folder is def pulled up to date? Will see what happens if I remove the quote from the welsh. Don't understand this

andrewvanbreda commented 2 years ago

@BirenRathod @kitenetter I don't know what the problem is with the Welsh, as I don't see that problem anymore, the downloads have Welsh in them (although a lot of the reports have changed so that might explain it). Biren could you pull these new reports which will correct the problem where the Other Target is never displayed. Whenever you get chance, not urgent. Cheers

reports/projects/PoMS/blurred_poms_results_cache_3.xml
reports/projects/PoMS/blurred_poms_results_lingual_3.xml
reports/projects/PoMS/blurred_poms_results_mapping_lingual_3.xml
reports/projects/PoMS/my_poms_results_in_detail_lingual_3.xml
reports/projects/PoMS/poms_results_in_detail_admin_cache_3.xml
reports/projects/PoMS/poms_results_in_detail_admin_lingual_3.xml

@kitenetter , am going to have to leave the Photo column in the download. I can't remove columns for the download and keep them in the grid without lots of custom coding. If you really need this, raise this separately as think it would need extra budget.

kitenetter commented 1 month ago

As far as I'm aware the individual user downloads are working as expected. Closing.