OLC-Bioinformatics / ConFindr

Intra-species bacterial contamination detection
https://olc-bioinformatics.github.io/ConFindr/
MIT License
22 stars 8 forks source link

DB location not writable #9

Closed andersgs closed 5 years ago

andersgs commented 5 years ago

Hi.

I am just trying out confindr. In our server setup, we have a linuxbrew user where we keep all tools for general use by all users (including the DBs). I have installed confindr as that user, and have setup the DB as that user and made sure everyone "knows" where it is by using the CONFINDR_DB env variable.

However, when trying to run confindr as myself, I get the following error:

  2018-11-06 07:34:03  Beginning analysis of sample R1...
  2018-11-06 07:34:03  Checking for cross-species contamination...
  2018-11-06 07:34:19  Setting up genus-specific database for genus Klebsiella...
Traceback (most recent call last):
  File "/home/linuxbrew/.linuxbrew/bin/confindr.py", line 993, in <module>
    Xmx=args.Xmx)
  File "/home/linuxbrew/.linuxbrew/bin/confindr.py", line 510, in find_contamination
    setup_allelespecific_database(databases_folder, genus, allele_list)
  File "/home/linuxbrew/.linuxbrew/bin/confindr.py", line 140, in setup_allelespecific_database
    with open(os.path.join(database_folder, '{}_db.fasta'.format(genus)), 'w') as f:
PermissionError: [Errno 13] Permission denied: '/home/linuxbrew/db/confindr/Klebsiella_db.fasta'```

We would like to avoid having dozens of `confindr` databases across the server. While we could probably create links for each user in writable folders in their home dirs and have it so the CONFDIR_DB env variable is unique to each user, this could probably be easily fixed if there were a `--tmpdir` flag that would allow any temporary DBs to be written in a separate folder (and any other temporary stuff unique to a run).

My suggestion would be to add a  `tmpdir` parameter with default `None` to `find_contamination`, and then on line 506 of confindr.py you would replace or add something along these lines:

Current line:

sample_database = os.path.join(databases_folder, '{}_db.fasta'.format(genus))

db_folder = databases_folder if tmpdir is None else tmpdir sample_database = os.path.join(db_folder, '{}_db.fasta'.format(genus))



You can then change the function `setup_allelespecific_database` to accept the path to the FASTA file rather than a path to a folder and then reconstructing the path again in the function. 

Best. Anders.
lowandrew commented 5 years ago

This is a case I hadn't considered - I'll get this implemented in the near future. The downside to this is that the way ConFindr works now you only ever have to set up a genus-specific database once, but if using a temporary file as the genus-specific database it'll have to get recalculated every time.

lowandrew commented 5 years ago

Now implemented - ConFindr 0.4.6 has a -tmp option which allows you to specify a directory to write genus-specific databases to and will get cleaned up at the end of a run (unless you specify the -k option to keep files). PyPi release has been updated, bioconda recipe will get updated later today.

andersgs commented 5 years ago

Thank you @lowandrew. Yes, I was trying to think about how to best get around that issue of recreating it every time. I suppose if the user specifies a directory of their choosing and use the -k option it could be re-used.