bcgsc / pavfinder

:mag: Post Assembly Variants Finder
Other
17 stars 5 forks source link

Run PV to detect transcriptome structural variants #9

Closed yanlina0205 closed 3 years ago

yanlina0205 commented 3 years ago

I'm running PV to detect transcriptome structural variants, but I'm not sure about some parameters of this command: find_sv_transcriptome.py --gbam <contigs_to_genome_bam> --tbam <contigs_to_transcripts_bam> --transcripts_fasta <indexed_transcripts_fasta> --genome_index <GMAP index genome directory and name> --r2c <reads_to_contigs_bam> <contigs_fasta> <gtf> <genome_fasta> <outdir>

Thank you.

kmnip commented 3 years ago

Hi @yanlina0205 ,

If you have paired end RNA-seq read files, then I recommend using the fusion-bloom make script, which runs the whole pipeline from transcriptome assembly, alignments, to find_sv_transcriptome.py.

To answer your questions: The FASTA index can be generated with samtools faidx. You will see a *.fai file generated for your FASTA file.

The find_sv_transcriptome.py script expects the following:

@readmanchiu can correct if I am wrong. Hope that helps!

yanlina0205 commented 3 years ago

Thank you! I will try it as you said.

readmanchiu commented 3 years ago

Thanks @kmnip for answering for me, somehow this slipped through my emails Yes, the descriptions are all correct. I think the common problem is the mismatch between the gtf file and the transcripts fasta file, which I think is the cause of the next issue you reported next The sv events detected are referenced by the transcript ids provided in the gtf file, and in order to extract the transcript sequences, the transcripts fasta have to have the same ids. So I provided the extract_transcript_sequence.py script in the package to generate the transcripts fasta from the gtf, to make sure this is the case. Hopefully this will solve the problem you encountered

Thanks for reporting the issue