bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
994 stars 354 forks source link

Latest STAR release 2.7.0d requires genome index rebuild #2693

Open rdocking opened 5 years ago

rdocking commented 5 years ago

Hi there,

I recently upgraded by bcbio install (I had to use the method recommended at https://github.com/bcbio/bcbio-nextgen/issues/2676 to get around conda issues). Testing the new install, I saw the following error on a small-RNA run:

EXITING because of FATAL ERROR: Genome version: 20201 is INCOMPATIBLE with running STAR version: 2.7.0d
SOLUTION: please re-generate genome from scratch with running version of STAR, or with version: 2.7.0d

It looks like the new version of STAR (released a few days ago) requires rebuilds of the genome indices:

https://github.com/alexdobin/STAR/releases

For now, I'm going to try to downgrade like so:

bcbio_conda install STAR=2.6.1d

Looks like this will require rebuilds of a lot of genome indices - thanks for looking into this!

rdocking commented 5 years ago

Looks like downgrading to 2.6.1d resolves the issue - might make sense to pin this version until rebuilt indices are available. Looks like the 2.7.x series of STAR also has a bunch of new single-cell analysis stuff incorporated.

roryk commented 5 years ago

Thanks for the heads up @rdocking. Pretty awesome that you opened the issue and found a fix within 15 minutes. I pinned to 2.6.1d for now, but will leave this issue open until we update the indices and move to 2.7+.

rdocking commented 5 years ago

Sounds good - happy to help!

roryk commented 5 years ago

I'm working on updating this to go along with the work on arriba, so this should be all set by the next release.

naumenko-sa commented 4 years ago

STAR is still pinned to 2.6.1d https://github.com/chapmanb/cloudbiolinux/blob/master/contrib/flavor/ngs_pipeline_minimal/packages-conda.yaml#L153

New star 2.7.3.a supports among other features spliced/unspliced alignments in SingleCell data: https://github.com/alexdobin/STAR/releases

We don't store STAR index in a bucket and we don't download it via recipe. Instead, we are generating it with bcbio_nextgen.py upgrade -u --data -genomes hg38 --aligners star --cores 10 https://github.com/chapmanb/cloudbiolinux/blob/master/cloudbio/biodata/genomes.py#L736

So we can just unpin STAR and recommend to delete the old index and generate a new one, or am I missing something?

smoe commented 3 years ago

Trying the above triggers

def _get_data_dir():
    base_dir = os.path.realpath(os.path.dirname(os.path.dirname(os.path.realpath(sys.executable))))
    if "anaconda" not in os.path.basename(base_dir) and "virtualenv" not in os.path.basename(base_dir):
        raise ValueError("Cannot update data for bcbio-nextgen not installed by installer.\n"
                         "bcbio-nextgen needs to be installed inside an anaconda environment \n"
                         "located in the same directory as the `genomes` directory.")
    return os.path.dirname(base_dir)

which is unfortunate - yes, I have fun with a bare metal installation without conda and a recent STAR install.

Would you accept a patch that points to the data directory via an environment variable and falls back to the current "anaconda" check if that environment var is not set / that dir not existing?

roryk commented 3 years ago

Hi @smoe,

It has been on my TODO list forever to add support for the newer versions of STAR, thanks for bringing this up again. I think I'm going to support both versions for a while with some deprecation warnings, just to ease the transition for folks since rebuidling the indices is going to cause pain.

Yes-- I think a patch with that behavior makes sense.