Taiji-pipeline / Taiji

All-in-one analysis pipeline
https://taiji-pipeline.github.io/
BSD 3-Clause "New" or "Revised" License
33 stars 9 forks source link

GeneRanks output files missing? #13

Closed maeleck closed 3 years ago

maeleck commented 3 years ago

Hi, I was able to generate the generank outputs with some random dataset for testing but when I tried these specific input files, Taiji seems able to process and output a bunch of files in ATAC seq and RNA seq folders but no generank.tsv and network file. I actually came across some error saying that I need more RAM for STAR but it seems to be able to finish running STAR for one raw dataset at a time and exited. Then I re-ran Taiji for STAR several times to process the rest of RNA seq data and everything went smoothly afterward.

Here are config and input below-

input: "SCsrageo.yml" output_dir: "TAIJISvarenoutput2/" assembly: "mm10" genome: "/mnt/genomes/Mus_musculus/UCSC_mm10/genome.fa" bwa_index: "/mnt/genomes/Mus_musculus/UCSC_mm10/BWAIndex/genome.fa" star_index: "/mnt/genomes/Mus_musculus/UCSC_mm10/STARIndex/" rsem_index: "/mnt/genomes/Mus_musculus/UCSC_mm10/RSEMIndex/genome" external_network: "pathway.tsv" tmp_dir: "/tmp"

input- ATAC-seq:

- group: 'schwannH3K4meS'
  id: 'H3K4me3SHAM'
  replicates:
  - rep: 2
    files:
    - path: SRR3831679
    - path: SRR3831680
      format: SRA

- group: 'schwannH3K4meC'
  id: 'H3K4me31dpi'
  replicates:
  - rep: 2
    files:
    - path: SRR3831681
    - path: SRR3831682
      format: SRA

- group: 'schwannH3K27me'
  id: 'H3K27me3intact'
  replicates:
  - rep: 2
    files:
    - path: SRR3829897
    - path: SRR3829898
      format: SRA

- group: 'schwannH3K27ac'
  id: 'H3K27ACintact'
  replicates:
  - rep: 2
    files:
    - path: SRR1645296
    - path: SRR1645297
      format: SRA 

- group: 'schwannH3K27acCut'
  id: 'H3K27ACcut'
  replicates:
  - rep: 2
    files:
    - path: SRR1645298
    - path: SRR1645299
      format: SRA

RNA-Seq:

- group: 'schwannsham'
  id: schwannsham_RNA
  replicates:
  - rep: 3
    files:
    - path: SRR5967266
    - path: SRR5967267
    - path: SRR5967268
      format: SRA
      tags: ['GeneQuant']

- group: 'schwann1dpi'
  id: schwann1dpi_RNA
  replicates:
  - rep: 3
    files:
    - path: SRR5967271
    - path: SRR5967272
    - path: SRR5967273
      format: SRA
      tags: ['GeneQuant']

- group: 'schwann2dpi'
  id: schwann2dpi_RNA
  replicates:
  - rep: 3
    files:
    - path: SRR5967274
    - path: SRR5967275
    - path: SRR5967276
      format: SRA
      tags: ['GeneQuant']

- group: 'schwann4dpi'
  id: schwann4dpi_RNA
  replicates:
  - rep: 3
    files:
    - path: SRR5967277
    - path: SRR5967278
    - path: SRR5967279
      format: SRA
      tags: ['GeneQuant']

- group: 'schwann6dpi'
  id: schwann6dpi_RNA
  replicates:
  - rep: 3
    files:
    - path: SRR5967280
    - path: SRR5967281
    - path: SRR5967282
      format: SRA
      tags: ['GeneQuant']

- group: 'Eed_WTintact'
  id: Eed_WTintactID
  replicates:
  - rep: 3
    files:
    - path: SRR6296973
    - path: SRR6296974
    - path: SRR6296975
      format: SRA
      tags: ['GeneQuant']

- group: 'Eed_KOintact'
  id: Eed_KOintactID
  replicates:
  - rep: 3
    files:
    - path: SRR6296982
    - path: SRR6296983
    - path: SRR6296984
      format: SRA
      tags: ['GeneQuant']

- group: 'Eed_KO1dpi'
  id: Eed_KO1dpiID
  replicates:
  - rep: 3
    files:
    - path: SRR6296985
    - path: SRR6296986
    - path: SRR6296987
      format: SRA
      tags: ['GeneQuant']
maeleck commented 3 years ago

Sorry, fixed the format of input file. Should be easier to copy and paste

kaizhang commented 3 years ago

There are several issues in your input file. First, RNA and ATAC will not be integrated as the group names don't match. Second, you need to put id to the replicate field, not the number of replicates. For example:

- group: 'Eed_KO1dpi'
  id: Eed_KO1dpiID
  replicates:
  - rep: 1
    files:
    - path: SRR6296985
      format: SRA
  - rep: 2
    files:
    - path: SRR6296986
      format: SRA
  - rep: 3
    files:
    - path: SRR6296987
      format: SRA

If you are not familiar with the YAML format, you can try the TSV format which may be easier for beginners.

maeleck commented 3 years ago

Thank you very much for the response, I will test it sometime on Tuesday when I am back in town.

maeleck commented 3 years ago

I think it is working but the software is still running from last night. Will post result soon but I have another question. Is it possible to change the group name of files in input yml after completion of run and partially reset sciflow.db by delete Step_name command and re-run the Taiji without needing to do STAR align all over again?

Edited: It finally worked, thank you very much! My earlier question still stands, very curious to know this.

maeleck commented 3 years ago

Disregard my earlier question, I ran into one issue. Not sure how to resolve this but maybe I could ignore the warning since I still got the output files.

[ERROR][11-02 09:13] Output_Ranks(6393..) Failed: Prelude.maximum: empty list CallStack (from HasCallStack): error, called at src/Control/Workflow/Interpreter/Exec.hs:145:37 in SciFlow-0.7.2-Jc8TJcu7aUL61DWlZpDMFY:Control.Workflow.Interpreter.Exec [ERROR][11-02 09:13] Program exits with errors

kaizhang commented 3 years ago

This bug doesn't affect the rank file output. However, the visualization will not be generated. I will fix this in next release.