Open mvdbeek opened 3 years ago
Ping @HadleyKing
@mvdbeek Can we also maybe address this issue? https://github.com/biocompute-objects/galaxy/issues/21 I feel like it should be possible but have no idea where to start...
I don't know what would go into this field. https://github.com/biocompute-objects/galaxy/issues/21#issuecomment-618894337 is not something we track in our tool model (though I think documenting accessed external resources in tools would be a good idea). I think for upload jobs we might be able to provide urls if the upload happened by pasting a URL, but uploads are generally not part of workflow invocations
So if I pull in data based on an accession from NCBI or query from UCSC it is not tracked?
It is, but not in a structured way that would tell you "hey, this is an external resource". So you'd have to inspect every parameter and guess whether it refers to an external URL / entity.
Hmm... https://galaxy.aws.biochemistry.gwu.edu/u/hadley/w/galaxy-biocomput-object-development-test In this workflow (my testing example) one of the tools is an external downloader... Would that be a place to start?
Also I was going to break out each of the bullets from above into a single issue in the BCO galaxy git. Does that make sense or do you think it is overkill? Because I see each of these as items that could spawn a discussion...
I would suggest adding something like external_service="<service_url>"
to tool xml language and then annotate tool parameters that reference an external entity with this.
So for a tool that downloads accessions this could be something like
<param name="accession" value="SRR12345678" external_service_url="https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=">
Sure, feel free to open as many issues as make sense. The things I listed here are things that can be addressed in a single PR, that's why there's just one issue.