Closed Djangodu closed 1 year ago
Dear Djangodu, thank you very much for your interest in our work. Could you share how you setup your environment and which python version you use?
Thank you very much, Best, Kevin
Dear Kevin
Thanks for your reply The environment i setup by the transposon_annotation_reasonTE README.md recommand, use mamba and conda to install it. And the python version is 2.7.
Looking forward to your reply Sincerely yours
Dear Djangodu, thank you very much for your interest in our work. Could you share how you setup your environment and which python version you use?
Thank you very much, Best, Kevin
Actually, from my results, there also have some traceback were showed in the picture i display, is it normal condition while running in this step?
Dear Kevin
I used your xml file to successfully create the environment. Thank you very much for your donation.
Although, there is another puzzle, the genome sequence is very large and I would like to run with your annotated tool at least 3.0 GB. So is it a good idea to separate them with a single chromosome? Because it would take so long with some of the special software in this annotated work with this big genome.
But, I am not sure that this operation is appropriate for whole genome annotation. I know that classification work will run after some software annotation, but if splitting the whole genome sequence into twenty pieces creates twenty results, another puzzle is how to combine them together?
And is it right to do that? OR I must do this annotation work with the whole genome sequence but not split it? Although, there is another puzzle, the genome sequence is very large and I would like to run with your annotated tool at least 3.0 GB. So is it a good idea to separate them with a single chromosome? Because it would take so long with some of the special software in this annotated work with this big genome.
But, I am not sure that this operation is appropriate for whole genome annotation. I know that classification work will run after some software annotation, but if splitting the whole genome sequence into twenty pieces creates twenty results, another puzzle is how to combine them together?
And is it the right thing to do? Or do I have to do this annotation work with the whole genome sequence but not split it?
If I have to do this with a whole genome sequence, is there some way to add more threads but not 1 or 2 threads? After all, the lower thread pull has the lowest effect with big genome.
Looking forward to your reply Sincerely yours
Hello Djangodu, I think, depending on your computational capacities, to split up the genome and to analyse it seperately. For this you need to create separate projects, as one project can only contain one sequence.fasta file.
It is not a problem at all, as you split them not randomly but by chromosome. If you would cut these chromosomes into pieces it would still be valid, its just you would risk to miss transposons at the cutting place.
You could try to run separate projects in parallel in different threads on your linux machine, and then it should be totally fine.
After you finished everything you could use common bioinformatic tools to combine separate annotation files to a whole again. You could also just open a text editor and concatenate the annotation files yourself if you are not able to write a programme or use bioinformatic tools. Just have a look with "Notepad++" inside. You just need to rename the sequence to the name of your chromosomes before putting all together again. You could just replace "seq1" with the name of the chromosome and thats it.
seq1 reasonaTE transposon 29700 29860 . + . transposon=251;class=2/1/2(hAT,TIR,DNATransposon)
seq1 reasonaTE transposon 34769 34942 . + . transposon=252;class=2/1/2(hAT,TIR,DNATransposon)
seq1 reasonaTE transposon 86291 87619 . + . transposon=253;class=2/1/5(Zator,TIR,DNATransposon)
seq1 reasonaTE transposon 88210 89547 . + . transposon=254;class=2/1/5(Zator,TIR,DNATransposon)
seq1 reasonaTE transposon 178154 182145 . + . transposon=255;class=2/1/5(Zator,TIR,DNATransposon)
seq1 reasonaTE transposon 199518 199683 . + . transposon=256;class=2/1/2(hAT,TIR,DNATransposon)
seq1 reasonaTE transposon 200395 200562 . + . transposon=257;class=2/1/4(Sola,TIR,DNATransposon)
seq1 reasonaTE transposon 214649 214819 . + . transposon=258;class=2/1/3(CMC,TIR,DNATransposon)
Hope that could help a little. I just tell you 3GB is a lot and you better search a cluster to do that. Please keep me updated in case you have progress or need any help. Best regards, Kevin
Dear Kevin
Thank you very much for your patient responses to my questions, these suggestions are very important to me.
Best wish Sincerely yours
Dear Kevin
First, it must to say that your work is very brilliant. These days i try my best to build the evironment about it, and install them by your guidance with every Step.
Although, there are some errors i can't deal with.
After install the sofewares, i downloaded the sequence.fasta you suggested, and running with the protocal of transposon_annotation_reasonaTE, everything is smoothly but two errors occur while running the Step 4, and i don't know how it is happed and deal with it.
For the first ERROR, i considered the outfiles formats not right, but don't know why present that, actually, i do everything the illustration recommand steps.
For the second ERROR, i compared your testProject file with mine, i just found two files in yours,
PipelineAnnotations_TransposonSequencesClasses.txt ToolAnnotations_TransposonSequencesClasses.txt
ToolAnnotations_TransposonSequencesClasses.txt.featuresA_20230827_134213.temp ToolAnnotations_TransposonSequencesClasses.txt.featuresA_20230827_144626.temp ToolAnnotations_TransposonSequencesClasses.txt.featuresB_20230827_134213.temp
these two ERRORs:
ValueError_:` node array from the pickle has an incompatible dtype:
Step 4) Run the pipeline on the genome annotations