Open waemm opened 3 years ago
Hi @waemm! Thanks for reporting!
I don't think we are symlinking in https://github.com/bcbio/bcbio-nextgen/blob/master/bcbio/distributed/split.py#L18.
Usually, it is advised to dig around the directory with:
cd /suspected/directory
find -L ./ -mindepth 15
(shows files with more than 15 level depth, i.e. circular symlinks).
Have you tried to investigate the directory?
/shared/pipeline-user/run_data/Exome_data_/Tumor_only_neo_batch4_wrapper_run/tumor_only_neobatch4/tumor_only_neobatch4_samples/work/mutect2/
What is the real path? Are there any symlinks involved?
You suspected that too many processes were accessing the file. How many samples are you processing? What is your parallel configuration, i.e how many worker jobs are created?
We had a somewhat related issue: https://github.com/bcbio/bcbio-nextgen/issues/3167 @gis-nlsim, have you discovered anything useful since then?
Sergey
Sorry, been caught up with other projects so I haven’t been trying to install bcbio. Will try it again at the end of this month.
Hi @naumenko-sa , thanks for your reply! I am not sure what is causing this. It has only happened since I have included a PON for mutect2. About your questions:
I am wondering as it seems to take issue with the index file in both my case and @gis-nlsim . Could it be how this file is read by the process? or an issue with the OS not allowing enough connections to it? It is a really strange error as nothing is being symlinked. I did see this issue being associated to gzip before (completely unrelated issue on different software), not sure who I could ask or who might know what is going on here?
Im not sure if it makes a difference that we're using a single mounted drive across the whole cluster? this has never been an issue but I bring it up just in case.
Hi @waemm !
All clusters use one or another shared file system, so that should not be an issue. Have you tried to reduce the N of reading processes to test the high load hypothesis? I.e. start bcbio with 1, 2, 5 nodes (16, 32, 80 cores)? Will it pass?
Not sure if that is related, sometimes increasing memory of a controller job helps -r conmem=4
:
https://bcbio-nextgen.readthedocs.io/en/latest/contents/parallel.html#ipython-parallel
Sergey
Closing for now, please feel free to re-open if there is evidence for the further investigation.
I have the same problem as waemm, it seems we have no solution for this yet.
Sorry, I missed it. I can reopen, @dauss75 could you please describe it again that I'd be able to create a reproducible issue? SN
Hi @naumenko-sa !
I'm running into a similar issue. As above, it is caused by a PON file provided to mutect2: OSError: [Errno 40] Too many levels of symbolic links: '/home/pipeline/bcbio/project/work/mutect2/panels/1000g.hg38.vcf.gz.tbi'
Here is a summary of our setup:
/home/pipeline/bcbio/
inside the container, and the bcbio work directory is at /home/pipeline/bcbio/project/work/
.cp -L 1000g.hg38.vcf.gz.tbi /home/pipeline/bcbio/1000g.hg38.vcf.gz.tbi
, which copies the PON index file from the nextflow staging area to a place where bcbio will find it (similarly with other ressources). mv
, and then tried cp -L
to see if it fixed the problem, to no avail. I've attached the full Nextflow logs, which contain the bcbio logs, for the run that failed with the symlink error on AWS Batch: bcbio.awsbatch.failure.log. I can also share the logs for the same run which terminates successfully when it is ran locally.
Thank you!
Hi everyone,
I have received this error several times when running bcbio with a PON for mutect. It looks like too many instances are trying to access this file? I'm not sure what is causing this or if anyone has suggestions as to how I could prevent this from happening? If I rerun bcbio it continues on without any issues.
The error: [12:apply]: OSError: [Errno 40] Too many levels of symbolic links: '/shared/pipeline-user/run_data/Exomedata/Tumor_only_neo_batch4_wrapper_run/tumor_only_neobatch4/tumor_only_neobatch4_samples/work/mutect2/panels/pon_v2.vcf.gz.tbi' [40:apply]: OSError: [Errno 40] Too many levels of symbolic links: '/shared/pipeline-user/run_data/Exomedata/Tumor_only_neo_batch4_wrapper_run/tumor_only_neobatch4/tumor_only_neobatch4_samples/work/mutect2/panels/pon_v2.vcf.gz.tbi'
Version info
bcbio_nextgen.py --version
): 1.2.3lsb_release -ds
): "CentOS Linux release 7.5.1804 (Core) "Your sample configuration file:
Observed behavior Error message or bcbio output: