GATB / gatb-minia-pipeline

GATB Minia assembly pipeline
29 stars 8 forks source link

hdf5 issue - unable to open file #19

Closed Krannich479 closed 4 years ago

Krannich479 commented 4 years ago

Hi, I previously ran the gatb-minia-pipeline successfully on CentOS Linux 7.6 but when I switched to an Ubuntu 16.04 LTS, I ran into some trouble:

(2019-11-01 14:54:52) GATB-pipeline starting
(2019-11-01 14:54:52) Command line: /opt/gatb-minia-pipeline/gatb --nb-cores 6 -1 paired.1.fastq -2 paired.2.fastq -s single.fastq -o metafiles/assembly 

(2019-11-01 14:54:52) Setting maximum kmer length to: 150 bp
(2019-11-01 14:54:52) Multi-k values and cutoffs: [(21, 2), (41, 2), (61, 2), (81, 2), (101, 2), (121, 2), (141, 2)]

(2019-11-01 14:54:52) Minia assembling at k=21 min_abundance=2
(2019-11-01 14:54:52) Execution of 'minia/minia'. Command line: 
     /opt/gatb-minia-pipeline/tools/memused /opt/gatb-minia-pipeline/minia/minia -in metafiles/assembly.list_reads -kmer-size 21 -abundance-min 2 -out metafiles/assembly_k21
Minia 3, git commit 40f35ad
setting storage type to hdf5
HDF5-DIAG: Error detected in HDF5 (1.10.5) thread 0:
  #000: /scratchdir/builds/workspace/gatb-minia/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5F.c line 509 in H5Fopen(): unable to open file
    major: File accessibilty
    minor: Unable to open file

Has this been seen/solved before? Thanks.

rchikhi commented 4 years ago

Hi Thomas,

I'm not sure about this one. Can you make sure that the input files do exist and that the metafilesfolder exists? Does the pipeline not run on any input on this system, even the simple test provided with it?

Rayan

svenrahmann commented 4 years ago

My experience is that such errors are the result of specifying paths for input or output files. minia doesn't deal well with that. If I start in my project root and call, for instance ./gatb-pipeline/gatb -1 data/reads1.fq -2 data/reads2.fq ... then minia fill produce files with "file lists" that contain the data/ path prefix, but it will look for them not relative to the project root (.) but to something else (perhaps the directory where gatb is, I don't know) and then it may not find the .fq files; a similar issue seems to arise if it wants to read or write h5 files, perhaps a directory that is assumed to exist does not exist. My workaround was not to include any paths in filenames. This is not beautiful, especially if you want to combine this with workflow management such as Snakemake, but it works.

rchikhi commented 4 years ago

hi Sven, yes in fact, gatb-pipeline is even less robust than minia (as it accumulates the bugs of both :) ). Sorry about that, it's indeed a more sure thing to specify full paths rather than relative paths. And regarding output, it's a byproduct of the fact that I always run analyses in current folders (e.g. cd /my/data && /path/to/gatb_pipeline/gatb -l file.fastq), not the most versatile handling of paths, I know.

rchikhi commented 4 years ago

I'm going to close this one, feel free to reopen in case the problem occurs again.

Krannich479 commented 2 years ago

I understand that I am incredibly late to the party but I ran into this problem again and remembered that this ticket exists. Thank you @rchikhi and @svenrahmann for commenting on this and providing valuable hints! For the gatb-minia-pipeline in particular specifying the absolute path for the output folder (parameter -o) solved the issue.

Bonus: specifying the absolute path for the input files only does not help. It looks like it does at first but, due to the required solution above, the run breaks after the first iteration.