gbif / pipelines

Pipelines for data processing (GBIF and LivingAtlases)
Apache License 2.0
40 stars 28 forks source link

isSequenced filter should take extensions into account #1032

Closed thomasstjerne closed 4 months ago

thomasstjerne commented 4 months ago

I think the isSequenced filter currently only looks at the associatedSequences field. As mentioned here https://github.com/gbif/pipelines/issues/515#issuecomment-952021548 it should also include occurrences that has any of these extensions:

  1. DNA derived data
  2. GGBN/amplification
  3. GGBN/cloning
  4. GGBN/gelimage
muttcg commented 4 months ago

@thomasstjerne Correct. Currently, it only checks associatedSequences.

Is there a specific field in extension, or is it just the existence of any listed extension?

MortenHofft commented 4 months ago

I believe we sometimes have extensions with absolutely no data in them. I would think those should be excluded.

muttcg commented 4 months ago

Deployed to PROD. Reinterpreting related datasets