Open cahende opened 2 years ago
Hi! Thanks for reporting this. Looks like the code has some trouble writing the results to a file. My first guesses would be:
/
or a space? Seems odd though if this happens for all TEs Would you mind sharing the command used to run deviaTE? And maybe double-check that the library of TE sequences is a valid fasta file? cheers
Hi,
So the TE names that EDTA output actually had a "/" in all the names, so I think that is the issue. I corrected this in my reference library and am rerunning now, I will let you know if this issue persists.
Thanks! Cory
On Tue, Mar 29, 2022 at 12:52 AM W-L @.***> wrote:
Hi! Thanks for reporting this. Looks like the code has some trouble writing the results to a file. My first guesses would be:
- the actual string of [RAW DATA] or [TE] contains some symbol that turns it into an invalid filepath, e.g. / or a space? Seems odd though if this happens for all TEs
- Permissions of the directory that it tries to write to could be another issue, but then I would expect a different Error.
Would you mind sharing the command used to run deviaTE? And maybe double-check that the library of TE sequences is a valid fasta file? cheers
— Reply to this email directly, view it on GitHub https://github.com/W-L/deviaTE/issues/10#issuecomment-1081544642, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHBUWERFMRQ2LKJCZAZFTZLVCKZFBANCNFSM5R4ZDOOQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hi,
The naming convention was the issue, it seems to be running fine now. On a side note - I am scanning for the presence of a large list of transposable elements and many don't have any reads mapping. Is there any way to prevent output from being produced when there are no reads mapping to a particular element?
Thank you, Cory
On Tue, Mar 29, 2022 at 2:48 PM Cory Henderson @.***> wrote:
Hi,
So the TE names that EDTA output actually had a "/" in all the names, so I think that is the issue. I corrected this in my reference library and am rerunning now, I will let you know if this issue persists.
Thanks! Cory
On Tue, Mar 29, 2022 at 12:52 AM W-L @.***> wrote:
Hi! Thanks for reporting this. Looks like the code has some trouble writing the results to a file. My first guesses would be:
- the actual string of [RAW DATA] or [TE] contains some symbol that turns it into an invalid filepath, e.g. / or a space? Seems odd though if this happens for all TEs
- Permissions of the directory that it tries to write to could be another issue, but then I would expect a different Error.
Would you mind sharing the command used to run deviaTE? And maybe double-check that the library of TE sequences is a valid fasta file? cheers
— Reply to this email directly, view it on GitHub https://github.com/W-L/deviaTE/issues/10#issuecomment-1081544642, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHBUWERFMRQ2LKJCZAZFTZLVCKZFBANCNFSM5R4ZDOOQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
This is the output I get for each transposable element in my test, to me this suggests there are no reads mapping to this particular TE?
**** Analysis
Starting analysis of TE_00000718_INT#LTR-unknown in SRR10235406-final.fastq.fused.sort.bam..
No annotaions found for: TE_00000718_INT#LTR-unknown
Normalization: none (values are raw abundances)
Analysis completed - output written to: SRR10235406-final.fastq.TE_00000718_INT#LTR-unknown
**** Visualization
Loading data: SRR10235406-final.fastq.TE_00000718_INT#LTR-unknown
Visualization written to: SRR10235406-final.fastq.TE_00000718_INT#LTR-unknown.pdf
On Thu, Mar 31, 2022 at 11:13 AM Cory Henderson @.***> wrote:
Hi,
The naming convention was the issue, it seems to be running fine now. On a side note - I am scanning for the presence of a large list of transposable elements and many don't have any reads mapping. Is there any way to prevent output from being produced when there are no reads mapping to a particular element?
Thank you, Cory
On Tue, Mar 29, 2022 at 2:48 PM Cory Henderson @.***> wrote:
Hi,
So the TE names that EDTA output actually had a "/" in all the names, so I think that is the issue. I corrected this in my reference library and am rerunning now, I will let you know if this issue persists.
Thanks! Cory
On Tue, Mar 29, 2022 at 12:52 AM W-L @.***> wrote:
Hi! Thanks for reporting this. Looks like the code has some trouble writing the results to a file. My first guesses would be:
- the actual string of [RAW DATA] or [TE] contains some symbol that turns it into an invalid filepath, e.g. / or a space? Seems odd though if this happens for all TEs
- Permissions of the directory that it tries to write to could be another issue, but then I would expect a different Error.
Would you mind sharing the command used to run deviaTE? And maybe double-check that the library of TE sequences is a valid fasta file? cheers
— Reply to this email directly, view it on GitHub https://github.com/W-L/deviaTE/issues/10#issuecomment-1081544642, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHBUWERFMRQ2LKJCZAZFTZLVCKZFBANCNFSM5R4ZDOOQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hi! Glad that your original issue was solved. deviaTE should probably check for such situations itself to be fair. I'll implement a fix for that. Concerning your second question: If there are no reads mapping to a TE reference, then deviaTE should give a message like this:
...
******************** Analysis
Starting analysis of [TE] in [BAM-FILE]..
No reads mapped to the specified reference sequence
...
The program should then exit without producing any output. Hope this helps! Lukas
I added a check to replace invalid characters in TE names, which should prevent the original error (https://github.com/W-L/deviaTE/commit/10d2b7063b2fef7fcaa24b0a45fa655a0c4d7565). I'm not going to make a new release of the package at this point. But if you would like to make use of this change, you can replace the updated code file on your computer (bin/deviaTE_analyse
in this repository). In case you installed the tool via conda, it should be located somewhere along the lines of:
~/miniconda3/envs/deviaTE_env/bin/deviaTE_analyse
Thank you for creating a fix for that naming issue. I am still curious about the other issue where it said I had no annotations but I still received output, can you explain what that means?
Cory
On Tue, Apr 5, 2022 at 3:54 AM W-L @.***> wrote:
I added a check to replace invalid characters in TE names, which should prevent the original error (10d2b70 https://github.com/W-L/deviaTE/commit/10d2b7063b2fef7fcaa24b0a45fa655a0c4d7565). I'm not going to make a new release of the package at this point. But if you would like to make use of this change, you can replace the updated code file on your computer (bin/deviaTE_analyse in this repository). In case you installed the tool via conda, it should be located somewhere along the lines of:
~/miniconda3/envs/deviaTE_env/bin/deviaTE_analyse
— Reply to this email directly, view it on GitHub https://github.com/W-L/deviaTE/issues/10#issuecomment-1088558242, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHBUWEQDDUR7TCBEV5QWWJLVDQLW5ANCNFSM5R4ZDOOQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
No problem! Forgot to mention that the fix is basically replacing problematic characters with dashes, so that the analysis can proceed without issues.
The message about "no annotations" refers to the optional parameter --annotation
. This can be used to provide GFF3 files with annotations of the TE sequences, e.g. the location of CDS and other defined genetic elements. These will mainly be used in the visualisation, e.g. at the bottom of this one:
Ahh, I see thanks for clarifying! So it is working as intended, fantastic.
I also wanted to broach another more broad question since I have your attention:
I am trying to identify TEs in unassembled natural genomes (not high enough coverage for a full assembly, especially for high repeat regions), so the library I am using is from TEs identified in a chromosome level genome build of a colony population. I feel like I will be missing potentially novel TEs circulating in these natural populations by using this method, which is the intent of this analysis. Can you provide any ideas on how to build a more fitting library for identification so I can identify TEs that might not be represented in the colony genome?
Thank you, Cory
On Tue, Apr 5, 2022 at 9:30 AM W-L @.***> wrote:
No problem! Forgot to mention that the fix is basically replacing problematic characters with dashes, so that the analysis can proceed without issues. The message about "no annotations" refers to the optional parameter --annotation. This can be used to provide GFF3 files with annotations of the TE sequences, e.g. the location of CDS and other defined genetic elements. These will mainly be used in the visualisation, e.g. at the bottom of this one: [image: image] https://user-images.githubusercontent.com/16755298/161801714-24779b2b-0c4d-4aeb-82e3-e7a74214f75b.png
— Reply to this email directly, view it on GitHub https://github.com/W-L/deviaTE/issues/10#issuecomment-1088989042, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHBUWERM73GXIEPMOIXTUG3VDRTDTANCNFSM5R4ZDOOQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
That's a tricky one. I think a two-pronged approach might be worth considering in this case.
You could then, for example, use a combined library of TE sequences from these with deviaTE to quantify the TE content. A possibly helpful review with lots of links to databases & tools: https://www.nature.com/articles/s41576-018-0050-x#ref-CR77
Thank you for the very useful information. Let me get back to you when I have had a chance to run this. I appreciate your help!
Cory
On Thu, Apr 7, 2022 at 3:01 AM W-L @.***> wrote:
That's a tricky one. I think a two-pronged approach might be worth considering in this case.
- Repository-based: Try and collect all relevant sequences from already existing TE databases for the species that you are studying
- De-novo assembly of repeats from raw reads: There are quite a few tools that can do this, but I don't know for which species and coverage they are suitable. Some that come to my mind are RepeatExplorer ( https://pubmed.ncbi.nlm.nih.gov/23376349/), dnaPipeTE ( https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4419797/), REPdenovo ( https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4792456/).
You could then, for example, use a combined library of TE sequences from these with deviaTE to quantify the TE content. A possibly helpful review with lots of links to databases & tools: https://www.nature.com/articles/s41576-018-0050-x#ref-CR77
— Reply to this email directly, view it on GitHub https://github.com/W-L/deviaTE/issues/10#issuecomment-1091467274, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHBUWEUQBMXASQDZVLQMZ73VD2W65ANCNFSM5R4ZDOOQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hello,
I am trying to run this for a set of raw sequences for Anopheles gambiae. I used EDTA to create a TE library from the agamP4 genome assembly and then used my raw sequences as input for this pipeline to identify which TEs are present in which samples we have. Following trimming/mapping, the pipeline attempts to identify TEs but I get the following error for every TE identified by EDTA.
Starting analysis of [TE] in [RAW DATA]-final.fastq.fused.sort.bam..
No annotaions found for: [TE]
Traceback (most recent call last): File "/home/ch943/bin/miniconda/envs/deviaTE_env/bin/deviaTE_analyse", line 100, in
sample.write_frame(out=args.output + '.raw', insertions=ihat, command=comm, t=timestamp, norm='raw')
File "/home/ch943/bin/miniconda/envs/deviaTE_env/lib/python3.6/site-packages/deviaTE/deviaTE_pileup.py", line 204, in write_frame
with open(out, 'w') as outfile:
FileNotFoundError: [Errno 2] No such file or directory: '[RAW DATA]-final.fastq.[TE].raw'
Any guidance would be appreciated.