GATB / gatb-core

Core library of the Genome Analysis Toolbox with de-Bruijn graph
https://gatb.inria.fr/software/gatb-core/
63 stars 27 forks source link

Having some troubles with File of files #16

Closed leoisl closed 7 years ago

leoisl commented 7 years ago

Hello,

When giving a file of files (fof) to create a graph with gatb::core::debruijn::impl::Graph::create() (-in parameter), I ran into several problems regarding relative paths. I am using the latest GATB release, v1.3.0, but it might be that you already fixed this problem in recent commits (apologies for that, I tried to find any commit that could deal with this, but I couldn't. However, I did not try the version on master...).

One example is if the directory structure is like this:

|--gatb_exec (the executable that creates the graph and do stuff)
|--input
    |--reads1.fa
    |--reads2.fa
    |--...
    |--fof

and fof contains:

input/reads1.fa
input/reads2.fa
...

An attempt on creating the graph results on: EXCEPTION: Unable to open bank 'output/tmp/readsFile' (if it is a list of files, perhaps some of the files inside don't exist) However, all the files do exist, and the paths are correct.

Weirdly enough, if the structure is a little bit different, and fof is:

../input/reads1.fa
../input/reads1.fa
...

it works (maybe in the first example, I should precede all files with ./ ? I did not test this...).

The temporary solution I found so far is to read the fof, and create another fof with the absolute paths of each file (using boost::filesystem::canonical() to do so), and give this absolute fof to gatb::core::debruijn::impl::Graph::create(). However, when creating this absolute fof, boost::filesystem::canonical() automatically inserts quotes since paths can have space, e.g.: "/Users/ishi/Desktop/kissplicesvn/kissplice/branches/CDBG_GWAS_GATB_1_3_0/cdbggwas/build/cdbggwas-1.5.5-Darwin/sample_example/GCA_001449145.1_WH-SGI-V-07060_genomic.fna" And the graph is still not built: EXCEPTION: Unable to open bank 'output/tmp/readsFile' (if it is a list of files, perhaps some of the files inside don't exist)

If the quotes are removed from the path, then the graph is successfully built, but surely this will cause problems to users having spaces in their paths.

I guess this is probably a minor bug, but since in several tools (including this one), usually it is the user that will build the fof, it could give them some headaches...

Thanks!!

rchikhi commented 7 years ago

Hi Leandro, the path inside your list of files should be relative to the location of the list, not to the current working director nor the location of the executable. So in your case, a list that contains:

reads1.fa
reads2.fa

should work. With absolute paths, what happens if you remove the quotes? Are spaces causing a problem? Perhaps try escaping the spaces (\[space character])

leoisl commented 7 years ago

Hello,

Ok, I did not know it was so simple, because I had the impression that I tried this before... But it does not seem so. Everything is fine now, thanks! Sorry for opening this issue, I should have re-checked what I thought I had done.

Thanks!