galaxyproject / galaxy

Data intensive science for everyone.
https://galaxyproject.org
Other
1.37k stars 990 forks source link

Duplicate HIDs on usegalaxy.org #3818

Open jmchilton opened 7 years ago

jmchilton commented 7 years ago

See @bgruening comment and sample history here https://github.com/galaxyproject/galaxy/issues/3816#issuecomment-289317874. It would seem to be unrelated to the original issue to me so I wanted to track it here. We don't and have never relied on only one thread generating HIDs to ensure uniqueness - so it doesn't seem related to me. It should be enforced by Postgres though so this is very troublesome for sure - xref https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/model/mapping.py#L2528.

bgruening commented 3 months ago

@yvanlebras has a history where a history can not be extracted.

The only thing that I can find in our logs is:

May 16 11:19:51 sn06.galaxyproject.eu gunicorn[3043719]: galaxy.workflow.extract INFO 2024-05-16 11:19:51,877 [pN:main.4,p:3181069,tN:WSGI_3] Cannot find implicit input collection for reference_genome|own_file
May 16 11:19:51 sn06.galaxyproject.eu gunicorn[3043719]: galaxy.workflow.extract WARNING 2024-05-16 11:19:51,882 [pN:main.4,p:3181069,tN:WSGI_3] duplicate hid found in extract_steps [13]
May 16 11:19:51 sn06.galaxyproject.eu gunicorn[3043719]: galaxy.workflow.extract WARNING 2024-05-16 11:19:51,890 [pN:main.4,p:3181069,tN:WSGI_3] duplicate hid found in extract_steps [16]
May 16 11:19:51 sn06.galaxyproject.eu gunicorn[3043719]: galaxy.workflow.extract WARNING 2024-05-16 11:19:51,890 [pN:main.4,p:3181069,tN:WSGI_3] Failed to find matching implicit job - job id is 69246218, implicit pairs are [('output_html', <galaxy.model.HistoryDatasetCollectionAssociation(2569404) at 0x7f09bef7a6d0>)], assoc_name is raw_data.

I tried to find it on sentry, but had no luck.

If I can help with more details, let me please know.

yvanlebras commented 3 months ago

Triskel made 3 histories. One with raw data. Then 2 different histories where input data (data collections) were "copied" from the first history. Then, on the 2 last histories, she apply almost same tools to have results. One history is ok when extraction of workflow, the other not. One possibility, is maybe that for creating the third history, she copied the 2 data collections from the second history and not the first one, so on the third history, input data collections are copied from the second history who are copied from the first.... My 2 cents for now

yvanlebras commented 3 months ago

Triskel just tried to copy the history who doesn't work for workflow extraction, then delete all analytical steps (keeping only input data collections), then apply a tool on it, and here, workflow extraction works. So it seems this is maybe more related to an analytical step than input datasets...

yvanlebras commented 3 months ago

ok, it seems this is due to the qualimap bam tool https://ecology.usegalaxy.eu/root?tool_id=toolshed.g2.bx.psu.edu/repos/iuc/qualimap_bamqc/qualimap_bamqc/2.2.2c+galaxy1. Deleting this step allow Triskel to extract a workflow!

mvdbeek commented 3 months ago

It's an unfortunate design choice, but implicit conversions create datasets with the same hid as the source, this isn't problematic per se. Extraction from histories could surely use a good overhaul, in the meantime it would help to share a record of a problematic history, so we can track that in a separate issue.

yvanlebras commented 3 months ago

Here is an history: https://ecology.usegalaxy.eu/u/ylebras/h/copy-of-poolfrancoiserfassembles