Closed thorellk closed 3 years ago
This way we would not have to write out the shovill assembly output file and could only write the cleaned assembly file after the screen for contaminants process ...
@thorellk , I do like this idea overall. We can integrate this once we've sure of the current code.
Regarding the script , here's the python3 version.
The only changes I've made to convert this script were to update the print
statements, luckily the script is short :)
print('Parsing {}'.format(args.input))
...
print('Wrote {0} contigs to {1}.'.format(count, args.output))
Please let me know the if it works as expected.
Works like a charm š
P.S. You could try out the 2to3
tool which comes bundled with Python3
to make sure that there are not huge changes necessary for the upgrade.
@thorellk , just to confirm - you would like this rename_fasta.py
to be included in the SCREEN_FOR_CONTAMINATION
process right?
Am I correct in invoking the script like this
python3 rename_fasta.py --input A-salmonicida.contigs.fa --output A-salmonicida.contigs.renamed.fa
I do understand the we can pass the --pre
value like --pre strainX_contig
which would create the same pattern as you mentioned earlier. However, I'd like to know the exact value for --pre
which you would use?
Perhaps in the end it'd look like strain_mucosa_contig1
etc, a few examples would help :)
Hi @abhi18av!
Sorry for my delay in responding. Yes, I thought it could be good to have it as step one in the SCREEN_FOR_CONTAMINATION
process since it hopefully may decrease the risk for sendsketch failing. The --pre
pattern should be the same as pair_id_'contig'
, would that be possible? For the example above it would be A-salmonicida_contig
Hi @abhi18av! Now we are back from holidays :) Should we take a Skype/Zoom some day later in the week to catch up a bit?
Hi @thorellk!
Sure, Iām available as well. Please let me know of a good time to connect.
Hi! Either today (I am available for another 4 hours), Thursday afternoon or Friday any time between 9-17 CET would work for me. How about you?
@thorellk, today I have a few backlog tasks lined up for this week - if possible I do prefer Thursday, anything between 14-17 CET works :)
Please feel free to mark the calendar and share the meeting link at abhi18avatoutlookdotcom
Great, I sent an invite for a Zoom call Thursday 14.00 CET :)
Thanks! Looking forward to finalizing the pipeline soon š
With this commit 7a6d00fa72f6628ee3b52b6a74db89801ab94d1b, the renaming script has been integrated.
I would like to integrate a script renaming the headers of the assembly fasta files from shovill to short, standardized headers including the strain identifier so that they would look like
This would hopefully make them less error prone and also a more suitable as prokka output. This way we would not have to write out the shovill assembly output file and could only write the cleaned assembly file after the screen for contaminants process, which would also agree better with the new use-screened-contigs branch. The only problem is that the script that I have used prior to this is not written in python3, which I think it should be to minimize extra dependencies. Would any of you @abhi18av @emilio-r or @boulund be able to translate that to python3 for me? I called it a txt file since .py files could not be uploaded here
rename_fasta.txt