AutoFlowResearch / SmartPeak

Fast and Accurate CE-, GC- and LC-MS(/MS) Data Processing
MIT License
43 stars 13 forks source link

[SmartPeakCLI] Error when provided mzML directory as a symbolic link on Linux #422

Closed buijt closed 2 years ago

buijt commented 3 years ago

Describe the bug SmartPeakCLI expects its input files and data files to be contained in the same execution directory. However, the bioinformatics pipeline in my organization is structured so that mzML data files are contained in a directory separate from the SmartPeak execution directory. To get around this, I created a symbolic link within the SmartPeak execution directory that points to the actual mzML data directory on the computing cluster.

However, SmartPeakCLI does not detect the symlink'd directory named "mzML" at all, and attempts to create the mzML input directory. This causes a filesystem conflict since there is already a symbolic link named "mzML," the execution halts due to this error.

To Reproduce Steps to reproduce the behavior:

  1. Collate all SmartPeak input files into the SmartPeak execution directory, except the mzML files folder
  2. Create a symbolic link ln -s /path/to/mzML /smartpeak/mzML from the directory containing mzML files and deposit the link named "mzML" into the SmartPeak execution directory
  3. Run SmartPeakCLI
  4. See error

Expected behavior SmartPeak accepts the symbolic link to the mzML directory invisibly as if it were a "real" directory, and executes the workflow normally.

logs

Version information

Additional context This would also be solved by https://github.com/AutoFlowResearch/SmartPeak/issues/417, providing a separate CLI option for the mzML input directory.

ahmedskhalil commented 3 years ago

Thanks for for submitting the issue, in order to use symlinks for any files or folders used by the SmartPeakCLI please make sure you provide the full path for both the target and the link_name like so ln -s /common/.../path/to/mzML /common/.../path/to/symlink/mzML

It is also recommended to provide the full path to the sequence.csv file should you opt for using symlinks in your datasets like so ./SmartPeakCLI -l /common/.../dataset/sequence.csv

Relative paths has proven to be unreliable when the issue was reproduced locally while providing full paths works as expected, however let us know if this doesn't workout either.