EI-CoreBioinformatics / mikado

Mikado is a lightweight Python3 pipeline whose purpose is to facilitate the identification of expressed loci from RNA-Seq data * and to select the best models in each locus.
https://mikado.readthedocs.io/en/stable/
GNU Lesser General Public License v3.0
97 stars 18 forks source link

how to set the pacbio iso reads for daijin configure #169

Closed xiaoaozqd closed 5 years ago

xiaoaozqd commented 5 years ago
  1. how to set the pacbio iso reads for sample_sheet.tsv used in daijin configure? just the fastq file transfer from the raw bam file or the all_quivered_hq_lq.fastq file output by isoseq3?
  2. when both illumina and pacbio input to daijin, wether the setting of my sample_sheet.tsv is right?

B33-3_1.clean.fq.gz B33-3_2.clean.fq.gz B33-3 fr-unstranded False hq_lq_quivered.fastq pacbio fr-unstranded True

  1. how to set the al and as for both illumina and pacbio?
lucventurini commented 5 years ago

Dear @xiaoaozqd , apologies for the late reply. Samples are specified in the sample sheet file, which is a tabular file of the following form:

Read1 <TAB> Read2 <TAB> SampleName <TAB> Strandedness <TAB> IsLongRead

For more details, please see the documentation.

So for example to add a long read sample, the line should read:

LongRead1.fq.gz<TAB><TAB>LongReadSample<TAB>Strandedness<TAB>True

Please note the two consecutive tabs between the read file name and the sample name.

I hope this clarifies the method. Please do not hesitate to ask for further clarification.

Kind regards

Luca Venturini

xiaoaozqd commented 5 years ago

Dear Lica, Thank you very much for your reply! I have set the sample sheet file: cat -T sample_sheet.tsv B33-3_RRAS17798-V_1.clean.fq.gz^IB33-3_RRAS17798-V_2.clean.fq.gz^IB333^Ifr-unstranded^IFalse hq_lq_quivered.fastq^I^Ipacbio^Ifr-unstranded^ITrue

I still have two problems:

  1. which fastq file should be used, the fastq file transfer from the raw bam file or the all_quivered_hq_lq.fastq file output by isoseq3?
  2. How to set the al and as for analysis illumina and pacbio toghter? my daijin configure as follow: daijin configure --scheduler "" \ --scoring dmelanogaster_scoring.yaml \ --copy-scoring dmelanogaster_scoring.yaml \ -m permissive --sample-sheet sample_sheet.tsv \ --flank 500 -i 50 26000 --threads 2 \ --genome Reference/genome_v2.2.fa \ -al hisat -as stringtie -od Dmelanogaster --name Dmelanogaster \ -o daijin.yaml --prot-db Reference/uniprot.fasta

The pacbio fastq file haven't been used. by the way there is an error:


MissingOutputException in line 886 of /home/zengqd/miniconda3/envs/mikado/lib/python3.6/site-packages/Mikado/daijin/tr.snakefile:
Missing files after 1 seconds:
Dmelanogaster/4-portcullis/output/portcullis_hisat-B333-0.pass.junctions.tab
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.```
lucventurini commented 5 years ago

Dear @xiaoaozqd , apologies for the very late reply. In #146, I addressed your problem: specifically, now Daijin will specifically ask for a long alignment method for long reads.

Many apologies again for the lateness.