bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
991 stars 354 forks source link

STAR for mapping RNA-Seq data to hg38 #3432

Closed kimal999 closed 3 years ago

kimal999 commented 3 years ago

Hi,

Can we map RNA-Seq data to hg38 with STAR? I have tried both stable and development versions. STAR directory is empty for both of the versions.

Version info

To Reproduce Exact bcbio command you have used:

bcbio_nextgen.py upgrade --genomes hg38 --aligners star --cores 16

Your sample configuration file:

Observed behavior Error message or bcbio output:

Expected behavior A clear and concise description of what you expected to happen.

Log files Please attach (10MB max): bcbio-nextgen.log, bcbio-nextgen-commands.log, and bcbio-nextgen-debug.log.

Additional context Add any other context about the problem here.

roryk commented 3 years ago

Hi @kimal999,

Hm-- this should work, we strip off the ALT and patches from the hg38 build so STAR will work with it. What was the output of the command you ran? Could you try adding the -u skip and --data options to the command you ran?

kimal999 commented 3 years ago

Hi @roryk ,

Thank you very much for the response.

Exact bcbio command

bcbio_nextgen.py upgrade --genomes hg38 --aligners star --cores 16 -u skip --data

Output

Upgrading bcbio
Upgrading bcbio-nextgen data files
--2021-02-02 10:37:48--  https://github.com/chapmanb/cloudbiolinux/archive/master.tar.gz
Resolving github.com (github.com)... 140.82.112.3
Connecting to github.com (github.com)|140.82.112.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://codeload.github.com/chapmanb/cloudbiolinux/tar.gz/master [following]
--2021-02-02 10:37:48--  https://codeload.github.com/chapmanb/cloudbiolinux/tar.gz/master
Resolving codeload.github.com (codeload.github.com)... 140.82.114.10
Connecting to codeload.github.com (codeload.github.com)|140.82.114.10|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/x-gzip]
Saving to: ‘STDOUT’

     0K ........ ........ ........ ........ ........ ........ 2.31M
  3072K ........ ........ ........ .......                    1.73M=2.4s

2021-02-02 10:37:51 (2.04 MB/s) - written to stdout [5191731]

List of genomes to get (from the config file at '{'genomes': [{'dbkey': 'hg38', 'name': 'Human (hg38) full', 'indexes': ['seq', 'twobit', 'bwa', 'hisat2'], 'annotations': ['ccds', 'capture_regions', 'coverage', 'prioritize', 'dbsnp', 'hapmap_snps', '1000g_omni_snps', 'ACMG56_genes', '1000g_snps', 'mills_indels', '1000g_indels', 'clinvar', 'qsignature', 'genesplicer', 'effects_transcripts', 'varpon', 'vcfanno', 'viral', 'purecn_mappability', 'simple_repeat', 'af_only_gnomad', 'transcripts', 'RADAR', 'rmsk', 'salmon-decoys', 'fusion-blacklist', 'mirbase'], 'validation': ['giab-NA12878', 'giab-NA24385', 'giab-NA24631', 'platinum-genome-NA12878', 'giab-NA12878-remap', 'giab-NA12878-crossmap', 'dream-syn4-crossmap', 'dream-syn3-crossmap', 'giab-NA12878-NA24385-somatic', 'giab-NA24143', 'giab-NA24149', 'giab-NA24694', 'giab-NA24695']}], 'genome_indexes': ['star', 'bwa', 'rtg', 'hisat2'], 'install_liftover': False, 'install_uniref': False}'): Human (hg38) full
hg38 detected, building a simple reference with no alts, decoys or HLA from bcbio/genomes/Hsapiens/hg38/seq/hg38.fa to bcbio/genomes/Hsapiens/hg38/seq/hg38-simple.fa.
Preparing STAR index from bcbio/genomes/Hsapiens/hg38/seq/hg38-simple.fa.
Removing bcbio/genomes/Hsapiens/hg38/seq/hg38-simple.fa.
bcbio-nextgen data upgrade complete.
Upgrade completed successfully.

Still the STAR directory is empty.

roryk commented 3 years ago

Hi @kimal999,

Does bcbio/genomes/Hsapiens/hg38/star exist? If you delete it and rerun does it actually build the index?

roryk commented 3 years ago

bcbio should be checking in your-genome-path/genomes/Hsapiens/hg38/star, is there a partially created STAR directory there?

kimal999 commented 3 years ago

Hi @roryk,

I'm very sorry for the late reply. I think it's a problem with the memory. I was able to generate the index outside of bcbio.

Thanks again for the help.