GLIMS-RGI / glims_issue_tracker

Track issues about the GLIMS database and web services
1 stars 0 forks source link

download from GLIMS in "RGI Format" (nunataks as holes, not as extra polygons) silently removes glaciers #2

Open fmaussion opened 3 years ago

fmaussion commented 3 years ago

All in the title - the number of glaciers should be the same regardless of the download type.

fmaussion commented 3 years ago

I think that this bug should be mentioned on the download website as a big warning: "Downloading data with RGI format may not yield all outlines available in GLIMS"

fmaussion commented 2 years ago

@bruceraup I think that users should be aware of this problem with a warning on the download portal until this is solved - when they choose "RGI format" as download option (the default), it is possible that GLIMS won't provide them with the full data request.

fmaussion commented 2 years ago

Trello: https://trello.com/c/L86Yv2wQ

betolink commented 2 years ago

Hi @fmaussion, I'm going to take a look at why this is happening. Do you have a glacier id that I can use for testing this behavior?

fmaussion commented 2 years ago

@betolink I'm on it, but in the meantime a good candidate to check would be the entity mentioned here: https://github.com/GLIMS-RGI/glims_issue_tracker/issues/4

As I mentioned in the corresponding trello card, as long as these errors remain in GLIMS, the "download as RGI" option cannot work properly - so in terms of priority I would recommend to sanitize GLIMS first before attempting to fix this.

fmaussion commented 2 years ago

@betolink other examples:

image

image

I count 88 missing entities in subm_id 624 alone.

betolink commented 2 years ago

Thanks Fabien, you're guess is that the misclassified orphan rocks are the ones causing the bug, correct? I'll take a look at these examples and the DB to understand what's going on. p.s. I don't have access to this Trello board.

fmaussion commented 2 years ago

you're guess is that the misclassified orphan rocks are the ones causing the bug, correct?

Partly. I think the bug is several bugs.

In all cases. any conversion script in GLIMS needs to have some safeguards after conversion, i.e. the total number of glac_bound should be the same before and after conversion to the RGI format.

fmaussion commented 2 years ago

Updated link to trello: https://trello.com/c/BBNEjwMn

fmaussion commented 2 years ago

@betolink @bruceraup any news on this?

As I said a while ago, I think it would have been much fairer to GLIMS users to add a big WARNING to the download option because of this issue, as long as it is not resolved (which seems to be a larger undertaking).

betolink commented 2 years ago

Hi @fmaussion, you're right. The fixes are not that complicated but for some reason NSIDC gave priority to put GLIMS data under the NASA EDL system, I think we can put a warning on the website until we release the fixed version.

bruceraup commented 2 years ago

I fixed this yesterday. As noted on Trello, outlines are no longer dropped. However, glac_bound polygons coming from a multi-polygon, as in the example above, still have the same analysis IDs. This "problem" must be resolved farther upstream. I say "problem", in quotes, because it's not actually wrong -- just a different way of doing things. RGI could group these together as multi-polygons (which is how they were submitted to GLIMS), as another potential solution to the problem of non-one-to-one correspondence between RGI IDs and analysis IDs. That said, converting all multi-polygons in GLIMS to separate ones is definitely on the to-do list.

fmaussion commented 2 years ago

Thanks! I'll check it ASAP. I have continued the discussion related to this in the Trello card. As discussed there, I don't think a multipolygon solution in RGI is compatible with glacier attributes such as is_tidewater or glacier length, etc.