Closed yanlina0205 closed 3 years ago
Hi @yanlina0205 ,
If you have paired end RNA-seq read files, then I recommend using the fusion-bloom
make script, which runs the whole pipeline from transcriptome assembly, alignments, to find_sv_transcriptome.py
.
To answer your questions:
The FASTA index can be generated with samtools faidx
. You will see a *.fai
file generated for your FASTA file.
The find_sv_transcriptome.py
script expects the following:
query_fasta
- de novo transcriptome assembly FASTA of your reads (such as those from RNA-Bloom)gtf
- transcript annotation GTF (such as those from Ensembl, UCSC, etc.)genome_fasta
- reference genome FASTA fileoutdir
- directory path where output files will be generated (i.e. not a prefix)--tbam
- BAM file of query sequences aligned to reference transcripts--gbam
- BAM file of query sequences aligned to reference genome--r2c
- BAM file of reads aligned to transcriptome assembly--transcripts_fasta
- reference transcript sequences--genome_index
- GMAP index directory and name for reference genome@readmanchiu can correct if I am wrong. Hope that helps!
Thank you! I will try it as you said.
Thanks @kmnip for answering for me, somehow this slipped through my emails
Yes, the descriptions are all correct.
I think the common problem is the mismatch between the gtf file and the transcripts fasta file, which I think is the cause of the next issue you reported next
The sv events detected are referenced by the transcript ids provided in the gtf file, and in order to extract the transcript sequences, the transcripts fasta have to have the same ids.
So I provided the extract_transcript_sequence.py
script in the package to generate the transcripts fasta from the gtf, to make sure this is the case. Hopefully this will solve the problem you encountered
Thanks for reporting the issue
I'm running PV to detect transcriptome structural variants, but I'm not sure about some parameters of this command:
find_sv_transcriptome.py --gbam <contigs_to_genome_bam> --tbam <contigs_to_transcripts_bam> --transcripts_fasta <indexed_transcripts_fasta> --genome_index <GMAP index genome directory and name> --r2c <reads_to_contigs_bam> <contigs_fasta> <gtf> <genome_fasta> <outdir>
<contigs_to_genome_bam>
,<contigs_to_transcripts_bam>
,<contigs_fasta>
and<reads_to_contigs_bam>
files corresponding to the samples.<indexed_transcripts_fasta>
, is it must be indexed by samtools or other softwares?<indexed_transcripts_fasta>
means the files after rawdata aligned to ref ?<gtf>
means the genome.gtf?<outdir>
can be the outdir/predix?Thank you.