bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
994 stars 353 forks source link

Bulk RNA-seq Run the analysis problem #3584

Closed panic0918 closed 2 years ago

panic0918 commented 2 years ago

**Thank you so much for your help regarding the installation.

I'm preparing for Bulkrnaseq by referring to https://bcbio-nextgen.readthedocs.io/en/latest/contents/bulk_rnaseq.html

But in the link,

  1. Run the analysis

It's blocked in the part.**

ngs@basiclab01:/exdisk/sda1/seqc/work$ bcbio_nextgen.py ../config/seqc.yaml -n 8 Running bcbio version: {VERSION} global config: /exdisk/sda1/seqc/work/bcbio_system.yaml run info config: /exdisk/sda1/seqc/config/seqc.yaml [2021-12-11T09:45Z] System YAML configuration: /exdisk/sda1/bcbio/galaxy/bcbio_system.yaml. [2021-12-11T09:45Z] Locale set to C.UTF-8. [2021-12-11T09:45Z] Resource requests: picard; memory: 4.00; cores: 1 [2021-12-11T09:45Z] Configuring 6 jobs to run, using 1 cores each with 4.00g of memory reserved for each job [2021-12-11T09:45Z] Timing: organize samples [2021-12-11T09:45Z] multiprocessing: organize_samples [2021-12-11T09:45Z] Using input YAML configuration: /exdisk/sda1/seqc/config/seqc.yaml [2021-12-11T09:45Z] Checking sample YAML configuration: /exdisk/sda1/seqc/config/seqc.yaml [2021-12-11T09:45Z] Retreiving program versions from /exdisk/sda1/bcbio/manifest/python-packages.yaml. [2021-12-11T09:45Z] Retreiving program versions from /exdisk/sda1/bcbio/manifest/r-packages.yaml. [2021-12-11T09:45Z] Testing minimum versions of installed programs [2021-12-11T09:45Z] multiprocessing: prepare_sample [2021-12-11T09:45Z] Preparing UHRR_rep1 [2021-12-11T09:45Z] Preparing HBRR_rep1 [2021-12-11T09:45Z] Preparing UHRR_rep2 [2021-12-11T09:45Z] Preparing UHRR_rep3 [2021-12-11T09:45Z] Preparing HBRR_rep2 [2021-12-11T09:45Z] Preparing HBRR_rep3 [2021-12-11T09:45Z] Resource requests: picard, samtools, star; memory: 0.50, 0.50, 10.00; cores: 1, 1, 10 [2021-12-11T09:45Z] Configuring 1 jobs to run, using 8 cores each with 80.1g of memory reserved for each job [2021-12-11T09:45Z] Timing: alignment [2021-12-11T09:45Z] multiprocessing: disambiguate_split [2021-12-11T09:45Z] multiprocessing: process_alignment [2021-12-11T09:45Z] Running 1st pass of STAR aligner on /exdisk/sda1/seqc/input/SRR950078_1.fastq.gz and /exdisk/sda1/bcbio/genomes/Hsapiens/hg38/star/

It keeps stopping at this state. What kind of problem?

panic0918 commented 2 years ago

I'm writing down information just in case you need it.

ngs@basiclab01:/exdisk/sda1/seqc/work$ grep -c processor /proc/cpuinfo 36 ngs@basiclab01:/exdisk/sda1/seqc/work$ grep "physical id" /proc/cpuinfo | sort -u | wc -l 2 ngs@basiclab01:/exdisk/sda1/seqc/work$ grep "cpu cores" /proc/cpuinfo | tail -1 cpu cores : 18

naumenko-sa commented 2 years ago

For STAR alignment the important resource is memory (RAM) I could speculate that you have 6 x 4 = 24G RAM or more?

STAR needs > 30G RAM, some runs with a lot of data are out of RAM even with 50G, and pass with 100G.

panic0918 commented 2 years ago

Hello! It's been a while since I posted comment! It's Christmas soon. I'm curious if you're having fun. Thanks to your help, I was practicing Bulk rna-seq. Thanks to you, I got compliments from the professor. Thank you again. I have about three questions, so please answer when you are free. :)

1.Is the analysis time good?

aligner: star
expression_caller: salmon & kallisto
fusion_caller: arriba & pizzly
-rw-rw-r-- 1 ngs ngs  557   12월 15 11:46 seqc.yaml
-rw-rw-r-- 1 ngs ngs  63304 12월 15 16:30 bcbio-nextgen.log
aligner: hisat2
expression_caller: salmon & kallisto
fusion_caller: arriba & pizzly
-rw-rw-r-- 1 ngs ngs   564   12월 15 17:12 seqc.yaml
-rw-rw-r-- 1 ngs ngs   83933 12월 15 20:07 bcbio-nextgen.log
aligner: hisat2
expression_caller: cufflinks
fusion_caller: arriba & pizzly
-rw-rw-r-- 1 ngs ngs  546    12월 15 21:00 seqc.yaml
-rw-rw-r-- 1 ngs ngs  104545 12월 16 03:18 bcbio-nextgen.log
  1. When I use cufflinks,

[2021-12-14T23:22Z] BAM record error: found spliced alignment without XS attribute

The command appears, but if I wait, the analysis will be completed. What kind of problem?

  1. I want to use TOPHAT2, but it's not on the ALIGNERS list. In this case, should I download it separately?

Lab people admire BCBIO and even one or two people ask me how to install it. I think you made a great work! Thanks once again.

naumenko-sa commented 2 years ago

Hi @panic0918 ! I am glad that bcbio works for you!

panic0918 commented 2 years ago

Thank you for your answer. There was a problem while using it recently. I tried to fix it myself, but it's not going on. Before I start driving, I successfully started driving with control_1, 2, 3. After that, I wanted to get treat_r1. so...

ngs@basiclab01:/exdisk/sda1$ vi seqc.csv samplename,description,category NA1,Treat_r1,Treat

I tried driving it,but...

ngs@basiclab01:~/myBulk_RNA-seq/2021-12-24_seqc/tpm$ vi tximport-tpm.csv gene,Control_r1,Control_r2,Control_r3

gene,Control_r1,Control_r2,Control_r3 Is there a way to keep doing this?

naumenko-sa commented 2 years ago

So you ran bcbio with 3 samples and then re-ran with one sample? To apply the changes, you need to delete your project/work and re-run,
otherwise bcbio will perform only the last step (upload to the project dir).