bgruening / galaxytools

:microscope::books: Galaxy Tool wrappers
MIT License
116 stars 228 forks source link

Integrate tbl2asn into Galaxy #102

Open bgruening opened 9 years ago

bgruening commented 9 years ago

Tbl2asn is a command-line program that automates the creation of sequence records for submission to GenBank. It uses many of the same functions as Sequin but is driven generally by data files. Tbl2asn generates .sqn files for submission to GenBank. Additional manual editing is not required before submission.

http://www.ncbi.nlm.nih.gov/genbank/tbl2asn2/

hexylena commented 9 years ago

+1

nsoranzo commented 9 years ago

A problem we've had with tbl2asn for Prokka wrapper is that the binary is versioned internally, but the file is just overwritten on the FTP without changing its name. Moreover, GenBank often increases the minimum version required, so it needs to be frequently updated.

bgruening commented 9 years ago

@nsoranzo yes this is always a problem, but for our pipeline we need this. Btw. I have worked some time ago on your missing perl libraries for prokka. Are you interested in using them (if there are working?)

nsoranzo commented 9 years ago

@bgruening I'm not working on Prokka at the moment, it's in the hand of CRS4 people. You may try to do a pull request, but they are not using tool_dependencies.xml to install tools (and I'm not either).

hexylena commented 9 years ago

I have an idea for storing versioned copies of tbl2asn, I'll make sure we have copies available. @bgruening do you have wrappers written for it already?

ons. 22. apr. 2015, 08.01 skrev Nicola Soranzo notifications@github.com:

@bgruening https://github.com/bgruening I'm not working on Prokka at the moment, it's in the hand of CRS4 people. You may try to do a pull request, but they are not using tool_dependencies.xml to install tools (and I'm not either).

— Reply to this email directly or view it on GitHub https://github.com/bgruening/galaxytools/issues/102#issuecomment-95166194 .

bgruening commented 9 years ago

Not yet, but storing versioned copies is not the problem I guess. The NCBI only accepts sequin (or similar) files that are processed with a recent version. For example they ship some kind of word-blastlist in this binary to check for spelling mistakes or wrong annotated genomes/proteins. @erasche haven't started yet, sorry.

hexylena commented 9 years ago

yep, that's a problem for people updating their galaxy :) (and us having automated package updates when new versions of tbl2asn come out)

hexylena commented 9 years ago

@bgruening https://github.com/galaxyproject/docker-build/blob/v2/tbl2asn/default/build.yml versioned tbl2asn packages here. Once @natefoo gets docker-build running jobs on cron we'll just run this weekly/monthly, and I'll add a job to generate automated PRs against the tbl2asn package in Galaxy, and then that coupled with the automated TTS pushes mean...no stress for us! :)

(Hey, @natefoo, do you just want me to add jenkins jobs for building these packages? I'm happy to, if you can send me an SSH pubkey, I'll ensure built packages go in a single directory, and you can regularly pull from the IUC's build server into depot.)

bgruening commented 9 years ago

@erasche, @nsoranzo the question is do we need this? This tools is only useful in the most recent version, it is a deadend tool, isn't it? We always need the latest version (?) Does it make sense to enable reproducibility for this tool by versioning binaries?

If we need this why not coping tbl2asn builds from ncbi every month/release to depot?

nsoranzo commented 9 years ago

You don't always need to have the latest version, but they get deprecated very fast. And when they are, you really have to update.

hexylena commented 9 years ago

@bgruening no we probably don't need versioned copies since old ones are useless. However, I feel like a (completely, 100% automated) updating of the version in the TS is preferrable to the tool, on every run, checking if the binary is older than N days and if so fetching the latest.

bgruening commented 9 years ago

@erasche agreed, but are you talking about a with every new binary? I would go with package_tbl2asn_latest or something like this. This has the advantage to not update the tool-version with every release.

hexylena commented 9 years ago

Yeah, I was talking about a _latest that would get updated. not registering a new package_tbl2asn_$date as that'd just be clutter. Would that work for you?

nsoranzo commented 9 years ago

:+1: for package_tbl2asn_latest

hexylena commented 9 years ago

Great, I'll make sure the IUC's jenkins bot can open PRs and set it up to trigger a "tbl2asn definition" update job whenever docker-build creates a new tbl2asn version.

natefoo commented 9 years ago

@erasche Works for me. Public key is at https://github.com/natefoo.keys

hexylena commented 9 years ago

@natefoo mind testing that you can login/pull data? You should be able to ssh in as natefoo@gx.hx42.org and you'll find jenkins will publish all produced files to the data/ directory in your home folder (/opt/depot/data/).

I'll set up jenkins/docker to build + place more binaries in there and ensure that they're versioned.

jvolkening commented 7 years ago

Two years later, but since this issue is still 'open'...

I would like to have tbl2asn as a standalone tool for use within various annotation and submission workflows. In checking to see if it exists I ran across this thread. However, it doesn't appear that tbl2asn itself was ever wrapped but rather included in other pipeline-based tools. Is this right? Am I duplicating anything existing if I wrap tbl2asn itself?

I see that tbl2asn is already in Bioconda. As far as the versioning issues above (which still apply), in my opinion it's not too much to expect system admins to update their versions periodically. This could easily be done with a cron job every month.

bgruening commented 7 years ago

@jvolkening happy to accept a PR with this tool. BioConda will make it easier for us to maintain it. Let's close this once and for all.