GenomicsAotearoa / long-read-assembly

Long-read assembly workshop
https://genomicsaotearoa.github.io/long-read-assembly/
GNU General Public License v3.0
5 stars 6 forks source link

fcs install #14

Closed DininduSenanayake closed 1 year ago

DininduSenanayake commented 1 year ago

Pull the image from upstream

DininduSenanayake commented 1 year ago
$ module load Singularity 

$ singularity inspect /opt/nesi/containers/fcs/fcs-gx-0.4.0.sif 
VERSION: production-0.4.0
org.label-schema.build-date: Friday_10_March_2023_21:7:54_UTC
org.label-schema.schema-version: 1.0
org.label-schema.usage.singularity.deffile.bootstrap: docker
org.label-schema.usage.singularity.deffile.from: us-east4-docker.pkg.dev/ncbi-seqplus-rodr-build-res/ncbi-cgr/fcs/gx:production-0.4.0
org.label-schema.usage.singularity.version: 3.4.0-1
juklucas commented 1 year ago

Hey @DininduSenanayake I tried to run the tool following the instructions here and it threw an error.

I ran

curl -LO https://github.com/ncbi/fcs/raw/main/dist/fcs.py
curl -LO https://github.com/ncbi/fcs/raw/main/examples/fcsgx_test.fa.gz

python3 ./fcs.py \
    screen genome \
    --fasta ./fcsgx_test.fa.gz \
    --out-dir ./gx_out/ \
    --gx-db /nesi/nobackup/nesi02659/LRA/resources/fcs/test-only  \
    --tax-id 6973 

And got

python3: error while loading shared libraries: libpython3.10.so.1.0: cannot open shared object file: No such file or directory
DininduSenanayake commented 1 year ago

@juklucas Can you try setting export FCS_DEFAULT_IMAGE=/opt/nesi/containers/fcs/fcs-gx-0.4.0.sif variable first and then call python 3 ./fcs.py.... i.e.

module purge
module load Singularity/3.11.3

export FCS_DEFAULT_IMAGE=/opt/nesi/containers/fcs/fcs-gx-0.4.0.sif

python3 ./fcs.py \
    screen genome \
    --fasta ./fcsgx..
DininduSenanayake commented 1 year ago

Confirming this is all working and updating the corresponding section on the repo .

module purge 
module load Python/3.8.2-gimkl-2020a
module load Singularity
export FCS_DEFAULT_IMAGE=/opt/nesi/containers/fcs/fcs-gx-0.4.0.sif
curl -LO https://github.com/ncbi/fcs/raw/main/dist/fcs.py
curl -LO https://github.com/ncbi/fcs/raw/main/examples/fcsgx_test.fa.gz
$ python3 ./fcs.py \
>     screen genome \
>     --fasta ./fcsgx_test.fa.gz \
>     --out-dir ./gx_out/ \
>     --gx-db /nesi/nobackup/nesi02659/LRA/resources/fcs/test-only  \
>     --tax-id 6973 

--------------------------------------------------------------------

tax-id    : 6973
fasta     : /sample-volume/fcsgx_test.fa.gz
size      : 8.55 MiB
split-fa  : True
BLAST-div : roaches
gx-div    : anml:insects
w/same-tax: True
bin-dir   : /app/bin
gx-db     : /app/db/gxdb/test-only/test-only.gxi
gx-ver    : Mar 10 2023 15:34:33; git:v0.4.0-3-g8096f62
output    : /output-volume//fcsgx_test.fa.6973.taxonomy.rpt

--------------------------------------------------------------------

    GX requires the database to be entirely in RAM to avoid thrashing.
    Consider placing the database files in a non-swappable tmpfs or ramfs.
    See https://github.com/ncbi/fcs/wiki/FCS-GX for details.
    Will prefetch (vmtouch) the database pages to have the OS cache them in main memory.

Prefetching /app/db/gxdb/test-only/test-only.gxs 99%...                                
Prefetched /app/db/gxdb/test-only/test-only.gxs in 0.243985s; 0.290255 GB/s. The file is 100% in RAM.
Prefetching /app/db/gxdb/test-only/test-only.gxi 99%...                         
Prefetched /app/db/gxdb/test-only/test-only.gxi in 7.24798s; 0.62397 GB/s. The file is 100% in RAM.
Collecting masking statistics...
Collected masking stats:  0.0295689 Gbp; 3.21688s; 9.19177 Mbp/s. Baseline: 1.0774

28.2MiB 0:00:20 [1.34MiB/s] [1.34MiB/s] [==========================================================================] 102%            
Processed 714 queries, 29.1754Mbp in 14.3783s. (2.02913Mbp/s); num-jobs:294

Warning: asserted div 'anml:insects' is not represented in the output!

--------------------------------------------------------------------------------------------------
Warning: Asserted tax-div 'anml:insects' is well-represented in db, but absent from inferred-primary-divs.
This means that either asserted tax-div is incorrect, or the input is predominantly contamination.
Will trust the asserted div and treat inferred-primary-divs as contaminants.
--------------------------------------------------------------------------------------------------

Asserted div               : anml:insects
Inferred primary-divs      : ['prok:CFB group bacteria']
Corrected primary-divs     : ['anml:insects']
Putative contaminant divs  : ['prok:CFB group bacteria']
Aggregate coverage         : 51%
Minimum contam. coverage   : 30%

--------------------------------------------------------------------

fcs_gx_report.txt contamination summary:
----------------------------------------
                                seqs      bases
                               ----- ----------
TOTAL                            243   27170378
-----                          ----- ----------
prok:CFB group bacteria          243   27170378

--------------------------------------------------------------------

fcs_gx_report.txt action summary:
---------------------------------
                                seqs      bases
                               ----- ----------
TOTAL                            243   27170378
-----                          ----- ----------
EXCLUDE                          214   25795430
REVIEW                            29    1374948

--------------------------------------------------------------------