Closed mariaalexandrastanciu closed 1 week ago
Hi Alexandra,
I am sorry that you experience problems with the GCparagon singularity image. I will try to test it locally and understand it better.
Do you get the same error (file not found) if you don't specify the paths to the reference files (i.e., none of what you provided above except for -b healthy_sWGS/bams/Genome-IJB-HP-10-xx_S81.markDup.bam
since -rgb hg38
is default)?
BR Benjamin
The error message is different, but the problem is the same. It seems that it still cannot find the file, but it looks for it in the default location.
See below:
$ singularity run gcparagon.sif -b healthy_sWGS/bams/Genome-IJB-HP-10-xx_S81.markDup.bam cannot proceed - no two-bit reference file defined and default expected file not present under {'hg38': PosixPath('/opt/conda/envs/GCparagon/lib/python3.10/site-packages/GCparagon/2bit_reference/hg38.analysisSet.2bit'), 'hg19': PosixPath('/opt/conda/envs/GCparagon/lib/python3.10/site-packages/GCparagon/2bit_reference/hg19.2bit')}. Terminating ..
I guess you built the singularity image yourself from the current repo (is this correct?).
Have you tried downloading and using the pre-built singularity image using:
singularity pull --arch amd64 library://bgspiegl/gcparagon/gcparagon-ubuntu-22_04-container:latest && singularity verify gcparagon-ubuntu-22_04-container_latest.sif
?
Unfortunately, I haven't tested the latest changes with singularity.
yes, I built the image myself from the current repo.
If I try the pre-built image I get the following error:
singularity pull --arch amd64 library://bgspiegl/gcparagon/gcparagon-ubuntu-22_04-container:latest FATAL: Unable to get library client configuration: remote has no library client (see https://apptainer.org/docs/user/latest/endpoint.html#no-default-remote)
Ah, I totally missed that Singularity was split into Singularity CE and Apptainer after Gregory Kurtzer left Sylabs in 2020.
You should be able to fix the problem by running the commands that are provided under the URL of your last message:
apptainer remote add --no-login SylabsCloud cloud.sycloud.io
and
apptainer remote use SylabsCloud
and
apptainer remote list
(which should list ''SylabsCloud' among your list of available remotes)
The command singularity remote list
should also list SylabsCloud.
What does it show for you?
After that the minimal pull command singularity pull library://bgspiegl/gcparagon/gcparagon-ubuntu-22_04-container:latest
should work.
I need more information to give you any reasonable support here.
I tested the most simple pull command singularity pull library://bgspiegl/gcparagon/gcparagon-ubuntu-22_04-container:latest
on another machine and it worked fine.
Which OS are you on?
What is the version of your singularity? (run singularity --version
from within the GCparagon conda env; the version of my singularity from the GCparagon conda is 3.8.6)
[edited - Apptainer/SingularityCE split]
Concerning the file not found
problem - not every location is accessible for the singularity image.
You would have to use paths under, e.g., $HOME for Apptainer/SingularityCE to actually be able to find the files (I am also a friend of absolute paths here).
If you want to use another directory permanently for GCparagon, you can always mount it as described in the Apptainer docs:
https://apptainer.org/docs/user/latest/quick_start.html#working-with-files
Example of the 2bit reference files in the singularity image:
For now, the hg38 2bit reference genome file is downloaded to
/opt/github/GCparagon/src/GCparagon/2bit_reference/hg38.analysisSet.2bit
but the program expects to find it under /opt/conda/envs/GCparagon/lib/python3.10/site-packages/GCparagon/2bit_reference/hg38.analysisSet.2bit
after pip install
.
You should be able to get the anticipated behaviour by passing the absolute paths under /opt/github/GCparagon/src/GCparagon/
where the files are actually located.
I will look into how to get a consistent behaviour irrespective of whether the user used the pip install
command (also in case of using the singularity image) or not.
Hi,
After running the apptainer commands you suggested I get the following error: [astanciu@lm4-f001 ~]$ singularity pull --arch amd64 library://bgspiegl/gcparagon/gcparagon-ubuntu-22_04-container:latest FATAL: While pulling library image: error fetching image: error making request to server: Get "https://library.sylabs.io/v1/images/bgspiegl/gcparagon/gcparagon-ubuntu-22_04-container:latest?arch=amd64": dial tcp: lookup library.sylabs.io on 192.168.254.91:53: no such host
And this is the list I get: [astanciu@lm4-f001 ~]$ singularity remote list
NAME URI DEFAULT? GLOBAL? EXCLUSIVE? SECURE? DefaultRemote cloud.apptainer.org ✓ ✓ SylabsCloud cloud.sycloud.io ✓ ✓
I am working on a Linux OS cluster and my singularity version is: apptainer version 1.3.0-1.el8
Hi,
It eventually worked to download the image, but I get the same issue:
singularity run gcparagon-ubuntu-22_04-container_latest.sif -b /healthy_sWGS/bams/Genome-IJB-HP-10-xx_S81.markDup.bam cannot proceed - no two-bit reference file defined and default expected file not present under {'hg38': PosixPath('/opt/conda/envs/GCparagon/lib/python3.10/site-packages/GCparagon/2bit_reference/hg38.analysisSet.2bit'), 'hg19': PosixPath('/opt/conda/envs/GCparagon/lib/python3.10/site-packages/GCparagon/2bit_reference/hg19.2bit')}. Terminating ..
Does it find the required hg38 files if you specify them like this: --two-bit-reference-genome /opt/github/GCparagon/src/GCparagon/2bit_reference/hg38.analysisSet.2bit --intervals-bed /opt/github/GCparagon/src/GCparagon/accessory_files/hg38_minimalExclusionListOverlap_1Mbp_intervals_33pcOverlapLimited.FGCD.bed --reference-gc-content-distribution-table /opt/github/GCparagon/src/GCparagon/accessory_files/accessory_files/hg38_reference_GC_content_distribution.tsv
[Edited: exchanged --exclude-intervals
with --intervals-bed
]
It seems that it moved on to the next file:
singularity run gcparagon.sif -b /globalscratch/ulb/bctr/astanciu/healthy_sWGS/bams/Genome-IJB-HP-10-xx_S81.markDup.bam --two-bit-reference-genome /opt/github/GCparagon/src/GCparagon/2bit_reference/hg38.analysisSet.2bit --exclude-intervals /opt/github/GCparagon/src/GCparagon/accessory_files/hg38_minimalExclusionListOverlap_1Mbp_intervals_33pcOverlapLimited.FGCD.bed --reference-gc-content-distribution-table /opt/github/GCparagon/src/GCparagon/accessory_files/accessory_files/hg38_reference_GC_content_distribution.tsv cannot proceed - no genomic intervals BED file defined and default expected file not present under /opt/conda/envs/GCparagon/lib/python3.10/accessory_files/hg38_minimalExclusionListOverlap_1Mbp_intervals_33pcOverlapLimited.FGCD.bed. Terminating ..
I am sorry, I made yet another mistake (it is --intervals-bed
, not --exclude-intervals
). Please try these fixed parameters:
--two-bit-reference-genome /opt/github/GCparagon/src/GCparagon/2bit_reference/hg38.analysisSet.2bit --intervals-bed /opt/github/GCparagon/src/GCparagon/accessory_files/hg38_minimalExclusionListOverlap_1Mbp_intervals_33pcOverlapLimited.FGCD.bed --reference-gc-content-distribution-table /opt/github/GCparagon/src/GCparagon/accessory_files/hg38_reference_GC_content_distribution.tsv
[EDIT: yet another mistake - this time in the path]
I ran this: singularity run gcparagon-ubuntu-22_04-container_latest.sif -b /globalscratch/ulb/bctr/astanciu/healthy_sWGS/bams/Genome-IJB-HP-10-xx_S81.markDup.bam --two-bit-reference-genome /opt/github/GCparagon/src/GCparagon/2bit_reference/hg38.analysisSet.2bit --intervals-bed /opt/github/GCparagon/src/GCparagon/accessory_files/hg38_minimalExclusionListOverlap_1Mbp_intervals_33pcOverlapLimited.FGCD.bed --reference-gc-content-distribution-table /opt/github/GCparagon/src/GCparagon/accessory_files/hg38_reference_GC_content_distribution.tsv
Error:
Traceback (most recent call last):
File "/opt/conda/envs/GCparagon/bin/gcparagon", line 8, in
Made another mistake. This time in the assumed correct default path. Please try:
--two-bit-reference-genome /opt/github/GCparagon/src/GCparagon/2bit_reference/hg38.analysisSet.2bit --intervals-bed /opt/github/GCparagon/accessory_files/hg38_minimalExclusionListOverlap_1Mbp_intervals_33pcOverlapLimited.FGCD.bed --reference-gc-content-distribution-table /opt/github/GCparagon/accessory_files/hg38_reference_GC_content_distribution.tsv
I am curerntly also testing this locally. Sorry for the series of mistakes. I am planning to change the singularity.def file, create another image and make it available soon.
[ Edit: the accessory_files
dir is in the outer GCparagon directory; paths changed for: --intervals-bed
and --reference-gc-content-distribution-table
parameters ]
I have the same error:
singularity run gcparagon-ubuntu-22_04-container_latest.sif -b /globalscratch/ulb/bctr/astanciu/healthy_sWGS/bams/Genome-IJB-HP-10-xx_S81.markDup.bam --two-bit-reference-genome /opt/github/GCparagon/src/GCparagon/2bit_reference/hg38.analysisSet.2bit --intervals-bed /opt/github/GCparagon/src/GCparagon/accessory_files/hg38_minimalExclusionListOverlap_1Mbp_intervals_33pcOverlapLimited.FGCD.bed --reference-gc-content-distribution-table /opt/github/GCparagon/src/GCparagon/accessory_files/hg38_reference_GC_content_distribution.tsv
Traceback (most recent call last):
File "/opt/conda/envs/GCparagon/bin/gcparagon", line 8, in
This is very odd. I am trying to create a new image now from the current code (tested successfully in script form). I am sorry for this persisting problem. An alternative would be to create the conda environment from the YAML file and run GCparagon simply as a python3 script. This, of course, can also fail if the conda environment can't be created for som dubious reason.
I will get back to you when I know more.
I would have tried with conda, but I am working on a cluster and I don't have this option unfortunately.
Thank you for your help!
Hi Maria.
Thank you once again for bringing up this issue and helping me making GCparagon a more convenient tool.
I think commit 93aa29c solved your issue.
Please also have a look at release v0.6.8 and the updated README.md.
The new singularity_definition_file/build_and_test_singularity_image.sh shows an example for running GCparagon from the container and binding/mounting a directory that is inaccessible to any singularity container by default (the -B
parameter).
Please get the new container:
singularity pull library://bgspiegl/gcparagon/gcparagon_0.6.8-ubuntu-22_04-container:latest
Does the --reference-genome-build hg38
parameter work now for you with this new container?
The problem of the default file paths breaking persists though for local installations of GCparagon (running pip install .
). I will fix this in another release.
BR Benjamin
Hi,
I have downloaded the image you suggested and I ran it with the -B parameter:
singularity run -B /globalscratch/ulb/bctr/astanciu/ gcparagon_0.6.8-ubuntu-22_04-container_latest.sif --bam /globalscratch/ulb/bctr/astanciu/healthy_sWGS/bams/Genome-IJB-HP-10-xx_S81.markDup.bam
And it worked.
Thank you.
Best regards, Alexandra
Hi,
I am trying to run GCparagon with singularity and I get the following error:
Traceback (most recent call last): File "/opt/conda/envs/GCparagon/bin/gcparagon", line 8, in
sys.exit(main())
File "/opt/conda/envs/GCparagon/lib/python3.10/site-packages/GCparagon/correct_GC_bias.py", line 3069, in main
raise AttributeError(f"2bit reference genome file '{two_bit_reference_file}' does not exist!")
AttributeError: 2bit reference genome file 'resources/hg38/gcpropagon' does not exist!
My call: singularity run gcparagon.sif -b healthy_sWGS/bams/Genome-IJB-HP-10-xx_S81.markDup.bam -rtb resources/hg38/gcpropagon/hg38_analysisSet.2bit -c resources/hg38/gcpropagon/hg38_minimalExclusionListOverlap_1Mbp_intervals_33pcOverlapLimited.FGCD.bed -rgcd resources/hg38/gcpropagon/hg38_reference_GC_content_distribution.tsv -rgb hg38
I have the file in the specified location for sure: user@lm4-f001 ~]$ ls /globalscratch/ulb/bctr/astanciu/resources/hg38/gcpropagon/hg38_analysisSet.2bit /globalscratch/ulb/bctr/astanciu/resources/hg38/gcpropagon/hg38_analysisSet.2bit
I have created the singularity image using the def script you provided.
Can you help me with this issue?
Thanks, Alexandra