galaxyproject / galaxy

Data intensive science for everyone.
https://galaxyproject.org
Other
1.37k stars 991 forks source link

List of things to improve for biocompute export #10361

Open mvdbeek opened 3 years ago

mvdbeek commented 3 years ago
nsoranzo commented 3 years ago

Ping @HadleyKing

HadleyKing commented 3 years ago

@mvdbeek Can we also maybe address this issue? https://github.com/biocompute-objects/galaxy/issues/21 I feel like it should be possible but have no idea where to start...

mvdbeek commented 3 years ago

I don't know what would go into this field. https://github.com/biocompute-objects/galaxy/issues/21#issuecomment-618894337 is not something we track in our tool model (though I think documenting accessed external resources in tools would be a good idea). I think for upload jobs we might be able to provide urls if the upload happened by pasting a URL, but uploads are generally not part of workflow invocations

HadleyKing commented 3 years ago

So if I pull in data based on an accession from NCBI or query from UCSC it is not tracked?

mvdbeek commented 3 years ago

It is, but not in a structured way that would tell you "hey, this is an external resource". So you'd have to inspect every parameter and guess whether it refers to an external URL / entity.

HadleyKing commented 3 years ago

Hmm... https://galaxy.aws.biochemistry.gwu.edu/u/hadley/w/galaxy-biocomput-object-development-test In this workflow (my testing example) one of the tools is an external downloader... Would that be a place to start?

HadleyKing commented 3 years ago

Also I was going to break out each of the bullets from above into a single issue in the BCO galaxy git. Does that make sense or do you think it is overkill? Because I see each of these as items that could spawn a discussion...

mvdbeek commented 3 years ago

I would suggest adding something like external_service="<service_url>" to tool xml language and then annotate tool parameters that reference an external entity with this. So for a tool that downloads accessions this could be something like <param name="accession" value="SRR12345678" external_service_url="https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=">

Sure, feel free to open as many issues as make sense. The things I listed here are things that can be addressed in a single PR, that's why there's just one issue.