BrooksLabUCSC / flair

Full-Length Alternative Isoform analysis of RNA
Other
208 stars 71 forks source link

`'run_id' is not defined` error while running `collapse-range` #281

Closed lvclark closed 1 year ago

lvclark commented 1 year ago

Copy and paste the exact command you tried to run

apptainer exec --bind /gpfs/shared_data/references/Homo_sapiens/NCBI/hg38:/mnt \
/gpfs/shared_data/singularity/flair/flair_2.0.0.sif \
flair collapse-range \
--threads $OMP_NUM_THREADS \
-g /mnt/Sequence/WholeGenomeFasta/GCA_000001405.15_GRCh38_no_alt_plus_hs38d1_analysis_set.fna \
--gtf /mnt/Annotation/GCA_000001405.15_GRCh38_full_analysis_set.refseq_annotation.gtf \
-q $OUTDIR/heart.flair_corrected.bed \
-r $OUTDIR/cell1.minimap2.aligned.bam,$OUTDIR/cell2.minimap2.aligned.bam,$OUTDIR/cell3.minimap2.aligned.bam,$OUTDIR/cell4.minimap2.aligned.bam \
--output $OUTDIR/heart2.flair_collapsed

I was on a PBS job with 8 threads and 64 Gb of memory.

How did you install Flair?

I had to build my own Apptainer container, since htslib and bedPartition were required for collapse-range to run but weren't on the container. I made a Docker image that I converted to Apptainer. Here is my Dockerfile.

FROM brookslab/flair:2.0.0

WORKDIR /usr/local/bin

RUN wget http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/bedPartition

RUN chmod 755 bedPartition

WORKDIR /opt

RUN wget --no-check-certificate https://github.com/samtools/htslib/releases/download/1.18/htslib-1.18.tar.bz2

RUN tar -xjvf htslib-1.18.tar.bz2

WORKDIR /opt/htslib-1.18

RUN ./configure --prefix /opt/htslib

RUN make

RUN make install

ENV PATH="${PATH}:/opt/htslib/bin"

WORKDIR /data

What happened?

multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/usr/local/lib/python3.10/dist-packages/flair/flair.py", line 405, in collapse
    args.o = args.temp_dir+run_id
NameError: name 'run_id' is not defined
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/flair", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/flair/flair.py", line 1056, in main
    status = collapse_range(run_id=run_id,)
  File "/usr/local/lib/python3.10/dist-packages/flair/flair.py", line 276, in collapse_range
    if 1 in p.map(collapse, ranges): # if a process failed
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 367, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 774, in get
    raise self._value
NameError: name 'run_id' is not defined

What else do we need to know?

Some of the chromosome names contain underscores, so based on previous experience with Flair I am on the lookout for that causing issues.

My corrected BED is 20 Gb, hence the need to use collapse-ranges! This is PacBio MAS-seq data.

Jeltje commented 1 year ago

Yes, sadly collapse range hasn't been kept up to date and rather than fixing it we're actually in the process of refactoring the code to accommodate batch runs more easily.

In the meantime you can simply repeat what the collapse range wrapper is doing. I've written the below without the ability to check it so you may have to make a few tweaks (or feel free to comment again here).

# partition the bed file into independent regions
sort -k1,1 -k2,2n --parallel=<cpus> input.bed > sorted.bed
bedPartition -parallel=<cpus> sorted.bed ranges.bed
bgzip sorted.bed
tabix -f --preset bed --zero-based sorted.bed.gz
mkdir rundir

cat ranges.bed | while read chr start end; do
    echo "flair collapse --range ${chr}:${start}-${end} -q sorted.bed.gz --threads (...rest of your inputs...) --output rundir/$chr$start$end.heart2.flair_collapsed" ;  
done > my.commands

# Run these on your cluster independently, then combine:
cat rundir/*isoforms.bed > $OUTDIR/heart2.flair_collapsed.isoforms.bed
cat rundir/*isoforms.fa > $OUTDIR/heart2.flair_collapsed.isoforms.fa
cat rundir/*isoforms.gtf > $OUTDIR/heart2.flair_collapsed.isoforms.gtf
lvclark commented 1 year ago

Ok, thanks for the reply! I am in the process of doing something similar based on the shell script that I found in the repo.