Closed Lynuxoo closed 3 months ago
Dear @Lynuxoo
Your not facing exactly the same issue as the one from #15
ChimeraTE has been updated today to the version 1.2. Among several minor bugs, I've considered yours.
Now, even though you don't have generated results.xprs file, which is causing your issue, the pipeline moves on.
Closing this issue, feel free to reopen if you need
Cheers
Dear @OliveiraDS-hub
Thanks for your response and updates!
However, I 'm sorry that I have problems again with the version 1.2 so I need to reopen the issue. At this time the pipeline even don't move on.
[Friday 15/3/2024 - 11h:37] Perfoming bowtie2 alignment for transcripts...
54063342 reads; of these:
54063342 (100.00%) were paired; of these:
17277796 (31.96%) aligned concordantly 0 times
5674228 (10.50%) aligned concordantly exactly 1 time
31111318 (57.55%) aligned concordantly >1 times
----
17277796 pairs aligned concordantly 0 times; of these:
131105 (0.76%) aligned discordantly 1 time
----
17146691 pairs aligned 0 times concordantly or discordantly; of these:
34293382 mates make up the pairs; of these:
31495021 (91.84%) aligned 0 times
411346 (1.20%) aligned exactly 1 time
2387015 (6.96%) aligned >1 times
70.87% overall alignment rate
Done!
[Friday 15/3/2024 - 13h:02] Calculating transcripts expression...
Unable to calculate gene expression! Including all transcripts with at least one read to the downstream analysis...
Traceback (most recent call last):
File "chimTE_mode2.py", line 179, in <module>
alignment_func(out_dir, aln_dir, mate1, mate2)
File "scripts/mode2_alignment.py", line 50, in alignment_func
pd.read_csv(str(f"{aln_dir}/genes_total_expressed.bed"), header=None, sep="\t", usecols=[0],names=['gene_id']).drop_duplicates().to_csv(f"{aln_dir}/genes_expressed_IDs.lst", header=None, index=False)
File "/home/huangfl/miniconda3/envs/chimeraTE/lib/python3.6/site-packages/pandas/io/parsers.py", line 688, in read_csv
return _read(filepath_or_buffer, kwds)
File "/home/huangfl/miniconda3/envs/chimeraTE/lib/python3.6/site-packages/pandas/io/parsers.py", line 454, in _read
parser = TextFileReader(fp_or_buf, **kwds)
File "/home/huangfl/miniconda3/envs/chimeraTE/lib/python3.6/site-packages/pandas/io/parsers.py", line 948, in __init__
self._make_engine(self.engine)
File "/home/huangfl/miniconda3/envs/chimeraTE/lib/python3.6/site-packages/pandas/io/parsers.py", line 1180, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/home/huangfl/miniconda3/envs/chimeraTE/lib/python3.6/site-packages/pandas/io/parsers.py", line 2010, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 382, in pandas._libs.parsers.TextReader.__cinit__
File "pandas/_libs/parsers.pyx", line 674, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: [Errno 2] No such file or directory: '/rd/huangfl/chimeraTE_1.2/ChimeraTE/projects/brainvar_mode2/P1_rep1_HSB272/alignment/genes_total_expressed.bed'
This is what I have in the ouput directory.
P1_rep1_HSB272
└── alignment
├── fpkm_counts
├── genes.bam
├── genes.bed
├── tes.bam
└── tes.bed
Thank you again for your help.
Dear @Lynuxoo
I'm so sorry for my oversight. Indeed the handling of exception was not correcly done.
Now, I've succesfully obtained your error message, and after the correction everything went smoothly.
Please, remove all mode2 scripts from the folder "scripts", and also chimTE_mode2.py from the main folder.
Download them again from github, and put them in their respective directories.
Then, run your analysis again using the same --project
name that you have used before. This will avoid the creation of index and the performing the alignments again.
Now ChimeraTE Mode 2 checks whether the output of each step has been already created.
If you follow this steps you will have this std output on your screen:
[Friday 15/3/2024 - 12h:22] Creating bowtie2 index for TEs...
TE index has been found!
Skipping...
[Friday 15/3/2024 - 12h:22] Creating bowtie2 index for transcripts...
Transcripts has been index found
Skipping...
Running analysis with ------------------------------------------> rep1
[Friday 15/3/2024 - 12h:22] Perfoming bowtie2 alignment for TEs...
TE alignment has been found!
Skipping...
[Friday 15/3/2024 - 12h:22] Perfoming bowtie2 alignment for transcripts...
TE alignment has been found!
Skipping...
[Friday 15/3/2024 - 12h:22] Calculating transcripts expression...
Expression files have been found!
Skipping...
[Friday 15/3/2024 - 12h:22] Identifying chimeric transcripts...
Chimeric transcripts file has been found!
Skipping...
[Friday 15/3/2024 - 12h:22] Merging coverage from different isoforms...
The issue is now reopened. I'll be looking forward for your answer if it has worked for you.
Thank you!
Dear @OliveiraDS-hub
Thanks for your prompt response. With your assistance, I've successfully completed the analysis of my samples using the modified script. I truly appreciate your support in navigating through the process.
I have another question regarding a prompt that appeared: Unable to calculate gene expression! Including all transcripts with at least one read to the downstream analysis...
What could be the reason behind this prompt? Could it be due to a low alignment rate? The alignment rates for some of my samples are as low as forty percent, whereas previously when aligning these same samples to the reference genome, the alignment rates were around ninety percent. If this is due to my data, what impact might it have on the analysis results I obtain?
Once again, thank you for your invaluable help and for creating such a useful tool!
Thank you @Lynuxoo, I hope ChimeraTE can help with your next findings!
Regarding the impact of including all transcripts instead of only those with a certain level of expression (default FPKM >= 1), this will only increase your processing time. In big genomes such as many plants and mamals, it's faster making the expression analysis and reducing the number of genes to those with expression, than computing all genes. Since very low expressed genes are not likely to produce chimeras, it's reasonable to not check for chimeric reads on them.
About your warning message, many reasons could explain that, such as unexpected special characters in the IDs of your transcripts, uncorrect strand parameters, and of course low alignment rate. In your case I doubt of any relation with alignment rate, it must be something else.
If you want to investigate the causes, we need to check the stdout and stderror message from express.
You can enable express messages changing a unique line on ChimeraTE's code. To do so, change the line 49 of the mode2_alignment.py script:
subprocess.call(['express',` '-o', str(f"{aln_dir}/fpkm_counts"), '-O', '1', '--output-align-prob', '--no-bias-correct', str('--' + str(args.strand)), str(args.transcripts), str(f"{aln_dir}/genes.bam")], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
to
subprocess.call(['express',` '-o', str(f"{aln_dir}/fpkm_counts"), '-O', '1', '--output-align-prob', '--no-bias-correct', str('--' + str(args.strand)), str(args.transcripts), str(f"{aln_dir}/genes.bam")])
Save it, run your analysis again and tell me what happens.
I'm going to close this issue, and if you want to go further about the causes of express issues, please open another issue to focus on it.
Cheers
Hello,
Thanks for your tool!
First, I got low levels of alignment of reads against transcirpt fasta, but it still continued to work.
And I had problems when it come into the step to merge coverage from different isoforms.
I try to follow the suggestion you give in #15 . I check my format of transcript fasta file.
This is my genes_expressed_IDs.lst
And this is my head genes.bed
Is there anything ununsual or the problem is just caused by my low level of alignment?
Many thanks.