EI-CoreBioinformatics / mikado

Mikado is a lightweight Python3 pipeline whose purpose is to facilitate the identification of expressed loci from RNA-Seq data * and to select the best models in each locus.
https://mikado.readthedocs.io/en/stable/
GNU Lesser General Public License v3.0
97 stars 18 forks source link

Trinity doesn't give me any results #421

Closed NJeanray closed 2 years ago

NJeanray commented 3 years ago

Hello,

I'm using Mikado on 3 samples. Here is my Daijin configuration file :

# Configuration properties for Daijin. This is an extended Mikado
# configuration file and can be given directly to Mikado itself.
#
align_methods:
  hisat:
  - ''
  star:
  - ''
asm_methods:
  stringtie:
  - ''
  trinity:
  - ''
db_settings:
  # Settings related to DB connection
  # db: the DB to connect to. Required. Default: mikado.db
  db: mikado.db
  # dbtype: Type of DB to use. Choices: sqlite, postgresql, mysql. Default:
  # sqlite.
  dbtype: sqlite
mikado:
  modes:
  # which mode(s) to run Mikado into. Default: permissive (split multiple
  # ORF models unless there is strong BLAST evidence against the
  # decision).
  - permissive
# name: Name to be used for the project
name: Daijin
# out_dir: Output directory for the project
out_dir: Daijin
pick:
  alternative_splicing:
    # Parameters related to how Mikado will select and report alternative
    # splicing events.
    # pad: Boolean flag. If set to true, Mikado will pad transcripts. Please
    # refer to the online documentation.
    pad: true
  chimera_split:
    # Parameters related to the splitting of transcripts in the presence of
    # two or more ORFs.
    # blast_check: Whether to use BLAST information to take a decision. See blast_params
    # for details.
    blast_check: true
    blast_params:
      # Parameters for the BLAST check prior to splitting.
      # leniency: One of 'STRINGENT', 'LENIENT', 'PERMISSIVE'. Please refer to the
      # online documentation for details. Default: STRINGENT
      leniency: STRINGENT
    # execute: Whether to split multi-ORF transcripts at all. Boolean.
    execute: true
  files:
    # Input and output files for Mikado pick.
    # input: Input GTF/GFF3/BED12 file. Default: mikado_prepared.gtf
    input: mikado_prepared.gtf
  run_options:
    intron_range:
    # A range where most of the introns (99%) should fall into. Transcripts
    # with too many introns larger or smaller than what is defined in this
    # range will be penalised in the scoring. Default: [60, 900]
    - 60
    - 10000
  # scoring_file: Scoring file to be used by Mikado.
  scoring_file: /usr/local/lib/python3.8/dist-packages/Mikado/configuration/scoring_files/HISTORIC/hsapiens_scoring.yaml
portcullis:
  # Options related to portcullis
  canonical_juncs: C,S
  do: true
prepare:
  files:
    # Options related to the input and output files.
    # gff: List of the input files.
    gff: []
    # strand_specific_assemblies: List of input assemblies to be considered as strand specific. Any
    # 'reference' input is automatically marked as strand-specific.
    strand_specific_assemblies: []
  # max_intron_length: Maximum length of an intron. Transcripts with introns bigger than this
  # value will be split in various sub-transcripts. Default: 1,000,000
  # bps.
  max_intron_length: 1000000
  # minimum_cdna_length: Minimum length of a transcript to be retained. Default: 200 bps
  minimum_cdna_length: 200
  # strand_specific: Boolean flag. If set to true, all assemblies will be considered as
  # strand-specific. By default, Mikado will consider the strand-
  # specificity of each assembly in isolation, see
  # 'files/strand_specific_assemblies'.
  strand_specific: false
reference:
  genome: /tmpdata/Genome/Homo_Sapiens/release_104/GRCh38/Homo_sapiens.GRCh38.dna.toplevel.fa
# scheduler: Scheduler to be used for the project. Set to null if you plan to use
# DRMAA or are using a local machine.
scheduler: SLURM
# seed: Random number generator seed, to ensure reproducibility across runs.
# Set to None('null' in YAML/JSON/TOML files) to let Mikado select a
# random seed every time.
seed: 0
serialise:
  # Settings related to data serialisation
  # codon_table: codon table to use for verifying/modifying the ORFs. Default: 0, ie
  # the universal codon table but enforcing ATG as the only valid start
  # codon.
  codon_table: 0
  files:
    # Options related to input files for serialise
    # transcripts: Input transcripts in FASTA format, ie the output of Mikado prepare.
    transcripts: mikado_prepared.fasta
  # max_regression: if the ORF lacks a valid start site, this percentage indicates how far
  # along the sequence Mikado should look for a good start site. Eg. with
  # a value of 0.1, on a 300bp sequence with an open ORF Mikado would look
  # for an alternative in-frame start codon in the first 30 bps (10% of
  # the cDNA).
  max_regression: 0.2
  # substitution_matrix: Substitution matrix used for the BLAST. This value will be derived
  # from the XML files, but it must be provided here or on the command
  # line when using BLAST tabular data. Default: blosum62, the default for
  # both BLAST and DIAMOND.
  substitution_matrix: blosum62
short_reads:
  r1:
  # Array of left read files.
  - /tmpdata/Processed_data_246/merged_lanes/fq_gz_files/SNIPr_01_03_384_58_5_7_VI_RP_1_S1_ME.R1.fq.gz
  - /tmpdata/Processed_data_246/merged_lanes/fq_gz_files/SNIPr_01_03_383_70_5_19_VI_RP_1_S2_ME.R1.fq.gz
  - /tmpdata/Processed_data_246/merged_lanes/fq_gz_files/SNIPr_01_03_382_35_6_15_VI_RP_1_S11_ME.R1.fq.gz
  r2:
  # Array of right read files. It must be of the same length of r1; if one
  # or more of the samples are single-end reads, add an empty string.
  - /tmpdata/Processed_data_246/merged_lanes/fq_gz_files/SNIPr_01_03_384_58_5_7_VI_RP_1_S1_ME.R2.fq.gz
  - /tmpdata/Processed_data_246/merged_lanes/fq_gz_files/SNIPr_01_03_383_70_5_19_VI_RP_1_S2_ME.R2.fq.gz
  - /tmpdata/Processed_data_246/merged_lanes/fq_gz_files/SNIPr_01_03_382_35_6_15_VI_RP_1_S11_ME.R2.fq.gz
  samples:
  # Array of the sample names. It must be of the same length of r1.
  - SNIPr_01_03_384_58_5_7_VI_RP_1_S1_ME
  - SNIPr_01_03_383_70_5_19_VI_RP_1_S2_ME
  - SNIPr_01_03_382_35_6_15_VI_RP_1_S11_ME
  strandedness:
  # Array of strand-specificity of the samples. It must be of the same
  # length of r1. Valid values: fr-firststrand, fr-secondstrand, fr-
  # unstranded.
  - fr-firststrand
  - fr-firststrand
  - fr-firststrand
# threads: Threads to be used per process
threads: 400
aln_index:
  star: "--limitGenomeGenerateRAM=168632691637"

As you can see, I want too use two methods for the assembling step : stringtie and trinity.

When I execute the pipeline, everything seems to run perfectly, but, when I have a look at the output, I can see that Trinity doesn't give me any output :

NJEANRAY@ocslg:/tmpdata/Alternative_Transcripts/daijin_mult/Daijin/3-assemblies/output$ ls -lrt
total 48
lrwxrwxrwx 1 NJEANRAY Utilisa. du domaine  140 Aug 13 09:00 stringtie-0-star-SNIPr_01_03_383_70_2003_5_19_VI_RP_1_S2_ME-0.gtf -> ../stringtie/stringtie-0-star-SNIPr_01_03_383_70_2003_5_19_VI_RP_1_S2_ME-0/stringtie-0-star-SNIPr_01_03_383_70_5_19_VI_RP_1_S2_ME-0.gtf
lrwxrwxrwx 1 NJEANRAY Utilisa. du domaine  142 Aug 13 09:03 stringtie-0-hisat-SNIPr_01_03_383_70_5_19_VI_RP_1_S2_ME-0.gtf -> ../stringtie/stringtie-0-hisat-SNIPr_01_03_383_70_5_19_VI_RP_1_S2_ME-0/stringtie-0-hisat-SNIPr_01_03_383_70_5_19_VI_RP_1_S2_ME-0.gtf
lrwxrwxrwx 1 NJEANRAY Utilisa. du domaine  142 Aug 13 09:05 stringtie-0-star-SNIPr_01_03_382_35_6_15_VI_RP_1_S11_ME-0.gtf -> ../stringtie/stringtie-0-star-SNIPr_01_03_382_35_6_15_VI_RP_1_S11_ME-0/stringtie-0-star-SNIPr_01_03_382_35_6_15_VI_RP_1_S11_ME-0.gtf
lrwxrwxrwx 1 NJEANRAY Utilisa. du domaine  138 Aug 13 09:09 stringtie-0-star-SNIPr_01_03_384_58_5_7_VI_RP_1_S1_ME-0.gtf -> ../stringtie/stringtie-0-star-SNIPr_01_03_384_58_2014_5_7_VI_RP_1_S1_ME-0/stringtie-0-star-SNIPr_01_03_384_58_5_7_VI_RP_1_S1_ME-0.gtf
-rw-r--r-- 1 NJEANRAY Utilisa. du domaine 2052 Aug 13 09:10 stringtie-0-star-SNIPr_01_03_384_58_5_7_VI_RP_1_S1_ME-0.gtf.stats
-rw-r--r-- 1 NJEANRAY Utilisa. du domaine 2053 Aug 13 09:10 stringtie-0-star-SNIPr_01_03_383_70_5_19_VI_RP_1_S2_ME-0.gtf.stats
-rw-r--r-- 1 NJEANRAY Utilisa. du domaine 2051 Aug 13 09:10 stringtie-0-star-SNIPr_01_03_382_35_6_15_VI_RP_1_S11_ME-0.gtf.stats
-rw-r--r-- 1 NJEANRAY Utilisa. du domaine 2050 Aug 13 09:11 stringtie-0-hisat-SNIPr_01_03_383_70_5_19_VI_RP_1_S2_ME-0.gtf.stats
lrwxrwxrwx 1 NJEANRAY Utilisa. du domaine  140 Aug 13 09:14 stringtie-0-hisat-SNIPr_01_03_384_58_5_7_VI_RP_1_S1_ME-0.gtf -> ../stringtie/stringtie-0-hisat-SNIPr_01_03_384_58_5_7_VI_RP_1_S1_ME-0/stringtie-0-hisat-SNIPr_01_03_384_58_5_7_VI_RP_1_S1_ME-0.gtf
-rw-r--r-- 1 NJEANRAY Utilisa. du domaine 2068 Aug 13 09:17 stringtie-0-hisat-SNIPr_01_03_384_58_5_7_VI_RP_1_S1_ME-0.gtf.stats
lrwxrwxrwx 1 NJEANRAY Utilisa. du domaine  144 Aug 13 09:17 stringtie-0-hisat-SNIPr_01_03_382_35_6_15_VI_RP_1_S11_ME-0.gtf -> ../stringtie/stringtie-0-hisat-SNIPr_01_03_382_35_6_15_VI_RP_1_S11_ME-0/stringtie-0-hisat-SNIPr_01_03_382_35_6_15_VI_RP_1_S11_ME-0.gtf
-rw-r--r-- 1 NJEANRAY Utilisa. du domaine 2081 Aug 13 09:18 stringtie-0-hisat-SNIPr_01_03_382_35_6_15_VI_RP_1_S11_ME-0.gtf.stats

So, I checked the logs in /Daijin/3-assemblies/logs/trinity , but I don't see any error message. It just seems that the process stops without any explanation. E.g. :

[...] [563000 lines read] scaff:X lend:56595869 rend:56595919
[564000 lines read] scaff:X lend:56773730 rend:56773805
[565000 lines read] scaff:X lend:57562923 rend:57568382
[566000 lines read] scaff:X lend:65796767 rend:65796843
[567000 lines read] scaff:X lend:69826936 rend:69834907
[568000 lines read] scaff:X lend:75186241 rend:75189110
[569000 lines read] scaff:X lend:76316605 rend:76324683
[570000 lines read] scaff:X lend:81277061 rend:81277142
[571000 lines read] scaff:X lend:101109196 rend:101115093
[572000 lines read] scaff:X lend:105032318 rend:105040878
[573000 lines read] scaff:X lend:115727392 rend:115730355
[574000 lines read] scaff:X lend:119487289 rend:119487605
[575000 lines read] scaff:X lend:135136283 rend:135136353
[576000 lines read] scaff:X lend:150705291 rend:150705365
[577000 lines read] scaff:Y lend:14597613 rend:14597826 CMD: touch star-SNIPr_01_03_382_35_6_15_VI_RP_1_S11_ME-0.sorted.bam.norm_200.bam.-.sam.frag_coverage.wig.ok CMD: /usr/local/bin/util/support_scripts//define_coverage_partitions.pl star-SNIPr_01_03_382_35_6_15_VI_RP_1_S11_ME-0.sorted.bam.norm_200.bam.-.sam.frag_coverage.wig 1 - > star-SNIPr_01_03_382_35_6_15_VI_RP_1_S11_ME-0.sorted.bam.norm_200.bam.-.sam.minC1.gff

Could you please advice ?

Thanks in advance,

Best regards, Nathalie

lucventurini commented 3 years ago

Hi @ljyanesm , @swarbred ,

Ideas?

swarbred commented 3 years ago

@NJeanray Based on the info provided I'm sorry but I cant really advise, note development of Daijin Assemble has now been discontinued as we have an alternative pipeline in development built around our same mikado tool but with greater flexibility so we are not actively using this ourselves. If there is no additional info on the error I would suggest looking at generating the mikado input separately and then running mikado steps manually (see the mikado documentation and we would be happy to advise on that). The successor to Daijin is a component of our REAT toolkit which I would expect to be able to advise as an alternative in the coming months.

NJeanray commented 3 years ago

Hello @swarbred,

Thanks for your answer. So now, I'd need to run the alignment algorithms (STAR, Hisat). Accoding to your doc, do I have to launch the command line for "Mikado prepare" step ?