avierstr / amplicon_sorter

Sorts amplicons from Nanopore sequencing data based on similarity
31 stars 8 forks source link

Problems running Amplicon_sorter on a Mac with M1 processor #14

Closed capoony closed 2 days ago

capoony commented 10 months ago

Hi @avierstr,

first of all thanks for this wonderful code.

I have no problems running on a CentOS system with the same Python installation (v.3.8; all dependencies installed via pip3) but on my Mac with M1 processors, I get this error message:

--> Reading 1996 out of 1996 sequences longer than 300bp
processing: file_0.todo
Process Process-1:
Traceback (most recent call last):
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/martinkapun/Documents/GitHub/HAPLOTYPES/envs/ampliconsorter/amplicon_sorter/amplicon_sorter.py", line 804, in similarity
    similarg = args.similar_genes/100
NameError: name 'args' is not defined
Traceback (most recent call last):
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/queues.py", line 245, in _feed
    send_bytes(obj)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/connection.py", line 405, in _send_bytes
    self._send(buf)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/queues.py", line 245, in _feed
    send_bytes(obj)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Estimating the ssg value for this dataset
Traceback (most recent call last):
  File "/Users/martinkapun/Documents/GitHub/HAPLOTYPES/envs/ampliconsorter/amplicon_sorter/amplicon_sorter.py", line 2204, in <module>
    sort_groups()
  File "/Users/martinkapun/Documents/GitHub/HAPLOTYPES/envs/ampliconsorter/amplicon_sorter/amplicon_sorter.py", line 2116, in sort_groups
    grouplist = update_list(tempfile)
  File "/Users/martinkapun/Documents/GitHub/HAPLOTYPES/envs/ampliconsorter/amplicon_sorter/amplicon_sorter.py", line 996, in update_list
    estimated_ssg = SSG(tempfile)  # estimate ssg value
  File "/Users/martinkapun/Documents/GitHub/HAPLOTYPES/envs/ampliconsorter/amplicon_sorter/amplicon_sorter.py", line 838, in SSG
    with open(tempfile, 'r') as tf:
FileNotFoundError: [Errno 2] No such file or directory: 'COX1_compare.tmp'

any suggestion would be appreciated!!

Many thanks, Martin

avierstr commented 10 months ago

Hi Martin, are you running it on a physical disk in your Mac or on a network drive (like Onedrive, Dropbox, Google drive or something Mac specific...) ? Network drives are always giving problems. If you are using a network drive, try to run in on your disk. Best regards, Andy

capoony commented 10 months ago

Dear Andy,

thanks so much for your quick response and suggestions. Following your thoughts, I tried the following:

1) I ran the command on a file located on a physical disk not under iCloud control. Unfortunately, with the same result as before

2) I further tested if it was an issue related to the M1 processor. I therefore repeated the same analysis on an older MacBook Pro with an Intel processor. No success either and same error as above

3) I further tested if it may have something to do with my installation. Initially, I installed Python v.3.8 using the native installer and then used pip3 to install the dependencies. Of course, I also adjusted the Shebang in the script to point at the right Python installation. Without success, same error. I therefore tried an alternative way, i.e. via conda:

conda create \
    -p ampliconsorter \
    -y \
    -c conda-forge \
    -c bioconda \
    python=3.8

conda activate ampliconsorter

# install dependencies
conda install \
    -y \
    -c conda-forge \
    -c bioconda \
   matplotlib biopython edlib

conda deactivate

This installation method always worked on Linux machines running either CentOS and Ubuntu for amplicon_sorter, but unfortunately not on the Mac always throwing the same error as before.

4) I also tried different Python versions (3.8, 3.9, 3.10), which all worked on Linux, but not on Mac.

I thus assume that amplicon_sorted has problems with Mac-type Unix.

No big issue, but I was just curious if there was an easy fix for this. I accordingly also paste the input file I used, in case this helps.

My commandline for amplicon_sorter always looked like this:

./amplicon_sorter.py  \
   -i COX1.fasta \
   -o demultiplexed/

COX1.fasta.zip

Many thanks for your help with this and all the very best,

Martin

avierstr commented 10 months ago

Dear Martin, Thanks for testing all this. I don't have access to a Mac so it is difficult to figure it out. I'm programming in Linux and I was told that should also work on Mac. I know people have used it on a Mac, but I don't know what generation of Mac it was. Strange it also does not work on your older MacBook Pro. I have been googling a bit and I don't find much about Python causing problems on Mac (in terms of the pure Python code) because the error message you get seems to be based on the python code. But I have found something about the M1 chip that can cause the problem. https://levelup.gitconnected.com/why-your-python-version-or-other-apps-dont-work-on-the-apple-macbook-m1-416af07de57b So I'm afraid I'm not able to fix that Mac problem for now.

Best regards, Andy

capoony commented 9 months ago

Dear Andy,

please apologize my delayed response. Thanks a lot for looking into this and no worries at all!!

I have another important question: We are currently developing a pipeline for amplicon seuencing and would like to include amplicon_sorter since it clearly outperforms other approaches :-)

Do you plan to add a license to your software (e.g. GPL3 or MIT)? This would be super helpful for us to further use and cite your great software.

Many thanks and all the very best,

Martin

avierstr commented 9 months ago

Hi Martin, No, I have no plans to add a license to my software. I'm happy when it gets cited :-)

Best regards, Andy

owenburroughs commented 4 months ago

Hi @avierstr,

I encountered this same issue on my M1 MacBook, and was able to solve the issue by adding the following line following name == main:

if __name__ == '__main__':
    multiprocessing.set_start_method('fork')

I'm afraid I don't know enough about multiprocessing to know what the ramifications of this are (possibly reduced safety/stability?), but I wonder if it might be worthwhile to make this option configurable through a command-line flag as a way to enhance compatibility. I would be happy to put together a pull request. Thanks for the useful tool!

avierstr commented 4 months ago

Hi @owenburroughs, thanks for this solution. It is easy for me to implement this as a command line flag.
amplicon_sorter_2024_06_18.py.zip In this zipped version the command line flag is '-mac' or '--macOS'. If you have time to test this version, please let me know if it works.

When checking the Python docs, there are 3 options: https://docs.python.org/3/library/multiprocessing.html

I found this on https://github.com/python/cpython/issues/77906, I assume this will be the same for amplicon_sorter.

Screenshot at 2024-06-18 19-50-40 Greets, Andy

avierstr commented 2 days ago

I have now added an single core version that should work: amplicon_sorter_single.py