Closed xaviermestdagh closed 1 year ago
Hi,
I found some documentation which says
Note that if requesting an event attribute value, the parent events attribute values will also be included in the output, so when requesting an attribute value it is not necesssary to know if the value will be stored at the event or parent level.
This leads me to believe that 20; 20 represents the percentage sun for the transect section combined with the overall percentage sun for the walk. I haven't yet worked out which order they are in yet.
For the anomalous records I find they have two entries in the ElasticSearch index. E.g.
This is wrong. Once this has been fixed (which I also don't know how to do yet) I think all records will show the two sun values.
No, it was me that was wrong. There area two versions of the record because it is marked as sensitive. I think one version will contain full details of the record while the other may hide some. Whether it is intentional to hide the parent attributes I am not sure.
I'm thinking this is probably a bug.
Both list_for_elastic_all.xml and list_for_elastic_sensitive_all.xml provide parent_sample_attrs_json
to LogStash but, while indicia_support_files\Elasticsearch\logstash-config\occurrences-http-indicia.conf
processes this in to a JSON field which ends up in event.parent_attributes
, occurrences-http-indicia-sensitive.conf
does nothing with it so it ends up as text in a field called parent_sample_attrs_json
of the ElasticSearch index.
Similarly, the Transect code is missing from the download because
occurrences-http-indicia-sensitive.conf
does nothing with recorded_parent_location_code
so it ends up as text in a field of the same name rather than being known as location.parent.code
. Likewise recorded_location_code
is also not renamed.
The download is designed to extract records at full precision so no details of an EBMS record should be hidden.
@johnvanbreda This is unfamiliar territory for me. I'm pretty confident I've found the issue that the LogStash configuration for sensitive records has not been kept up to date with changes elsewhere. We've picked up 3 fields here which need correcting. Maybe there are others you will know about. I guess we will then need to re-index all sensitive records.
Getting back again this issue (that was described before #489 ), this value for %Sun is generating the problem on downloading the csv documents for users including a semi-colon between the sun values. I don't really get well the bug that you found @JimBacon but I don't understand anyway why two values of Sun are included in the same column.
I would like to separate both values into two columns, 20; 20 one representing the percentage of the sun for the transect section and another with the overall percentage of the sun for the walk.
It can look simple to avoid the semi-colon for opening the downloaded file, but the normal users of eBMS don't have much management data skills. Therefore, if we can remove the semi-colon in the download files, users can open easily downloaded files from eBMS directly on excel.
@johnvanbreda was mentioned before regarding this issue.
The Slovenian coordinator is checking data from 2022 and found strange the way of %sun is presented on the download file:
"in the excel (you downloaded data for me) for sun (%) it's written "60;80" and for cloud (%) "40". But if I take a look into database, the data show 80 % of sun for sections from 1-5 and 60 for section 6. So, what does the excel show in the collumn "sun" and "cloud" - how can one read the data from excel without being able to see the data entry in the eBMS app?"
In this case, when it is different between sections, is not reported well and It is confusing. I would like to place the correct value on the section and place in another column the overall percentage of the sun to avoid two values in the same column. At the moment the users don't know what means the two values. Any idea how I can answer the last question from the Slovenian coordinator?
thank you in advance. @JimBacon @DavidRoy
I totally support this comment, Harriet and I discussed it yesterday. The %sun and %clouds is very confusing to our users. I understand it could be that the transect is sunny while most of the sky is cloudy, but is the general cloudiness important relative to how sunny the actual transect walk is? Getting two values that don't necessarily sum up to 100 is one part of the confusion. If they are designed to sum up to 100, do we need both of them then?
@JimBacon The requirement here is:
@larspett we have both %sun and %cloud because schemes have different guidance on which to record! They auto calculate depending on what’s entered and sum to 100. So we need to keep for ebms but can simplify the same form on the spring website
@DavidRoy having both is rather confusing, I would prefer a toggle between them or a scheme-dependent visibility, depending on what is the easiest solution. You could lock the visibility in a scheme-dependent fashion i Drupal
Hello @JimBacon and @kazlauskis
(sorry I don't know if directly related to this issue)
An observer (Youri Martin) using the mobil app on iPhone highlighted that the %Cloud at sample level (not section level) is not kept after uploading to the web app (the field %Sun is empty when editing). On my side, when I make an export with the "Scheme admin > Downloads", the columns %Sun neither %Cloud are included. In the export through the "My annual report", the column %Sun is empty but values are provided in the column %Cloud (means data is existing somewhere...).
See for instance the walks of 2022 on EBMS:Luxembourg:166 (Haardt sud).
Thinking the info was lost, the observer already opened many other walks one by one on the web app to add the missing value...
@Vilius-Stankaitis can you check if the app uploads the Cloud values and if it does, then add the warehouse attribute ID here for Jim?
Yes, the app uploads only Cloud values
Section level seems to be working
Sample level not working, but the cloud value was uploaded.
cloud warehouse attribute id: 1457
@Vilius-Stankaitis can the app also submit the %sun value (=100 - % cloud). Seems simpler than a process that runs on the warehouse?
Yes, it would be easy to upload the sun value, too, a bit strange the survey needs both values, though.
Agreed, the difficulty is that some schemes use %cloud and some use %sun. The uk use sun as they are happier reporting sun; spain use cloud as they are happier reporting cloud :-)
Still need to resolve the download issues, I think.
There is an additional problem to resolve before we can output %Sun and %Cloud at the top level (I'll call it walk level, like the website does).
The website and the warehouse have been set up to record %Sun. It looks like, for a couple of years, the app has been submitting %Cloud, although it is now submitting %Sun and %Cloud as requested above,
What this means, as noted previously, is that if you edit an older app record on the website, the %Sun field is empty and the %Cloud is not shown. If you enter a value for %Sun there is a good chance it won't be (100 - %Cloud) and we get an inconsistency in our data. (It was a bad idea to store both sun and cloud - we should have stored one and deduced the other so there would never be a consistency issue.)
I can see three possible resolutions.
@DavidRoy, what is your preference?
By the way, I guess this has gone undetected because %Sun and %Cloud are not present in either Scheme Admin > Downloads > Download Sample Information from Transects or My Data > My Downloads > Download Sample Information from Transects so I could ask whether it is needed at all!
@chrisvanswaay @CrisSevilleja what is your view on this. An alternative is to remove the %sun (or %cloud) from the transect sections data entry and just have it at the walk level. This was inherited from the UKBMS but I've never seen this data analysed so wonder what the point of collecting it is! What is done in the Netherlands and in the eBMS manual?
@DavidRoy We only collect weather data on the visit-level of the whole transect, and that sounds the logical thing for the 15 min counts.
@chrisvanswaay we already collect at the sample/visit level for 15 min counts. This discussion is about the transects. Do you agree that we should remove the %sun requirement at the section (sub-sample) level (after checking with the schemes)?
@DavidRoy Sorry for the misunderstanding, I just came in in the hotel (in Bayreuth) and did not have enough time to go through the whole discucssion.
Added a fix to allow parent (Walk) and child (Section) attributes to be output independently and not in the same column
Changed configuration of occurrence download for Scheme Admin>Downloads and My Data>My Downloads, replacing
{"caption":"% Sun","field":"#attr_value:event:1387#"},
{"caption":"% Cloud","field":"#attr_value:event:1457#"},
with
{"caption":"Walk % Sun","field":"#attr_value:parent_event:1387#"},
{"caption":"Walk % Cloud","field":"#attr_value:parent_event:1457#"},
{"caption":"Section % Sun","field":"#attr_value:event:1387:noparent#"},
{"caption":"Section % Cloud","field":"#attr_value:event:1457:noparent#"},
This resolves one part of this issue meaning that %Sun will no longer contain two values like "20;20"
that is awesome Jim because that generates a lot of problems with the downloads, thanks!
Regarding what to do with the % of sun or clouds. Asking fast to regional coordinators in Spain they don't think the value per section is needed, in general, we recommend volunteers to take the starting and ending values and do the average (also in the manual). If we can reduce the complexity leaving only the average value, go for that but we need to change the website and app entry.
In preference, the majority of schemes in Europe prefer % of clouds rather than %sun. Cheers
I agree %Cloud at walk level is great for Luxembourg!
I have fixed the way that ElasticSearch indexes sensitive records as proposed above. This fills in the missing transect code and walk %sun/cloud values of records marked as sensitive.
For comparison with the original screenshot which raised this issue, here is what the first few lines of the "download species occurrence from transects" now look like.
I have added the walk-level %Sun and %Cloud to "download sample (visit) information from transects" for both My Data > My Downloads and Scheme Admin > Downloads as requested above.
I am going to close this issue as its original problem has been resolved and I will raise a new issue for sorting out just how we want to record %Sun
Hi, What is the meaning of for instance the value "20; 20", and why is it sometimes just "20" for some records (missing Transect Code value)?