Closed sipesk closed 4 months ago
Hi @sipesk,
Can you try re-running without the -t option? There may be an issue with the parallelization.
Let me know if that works or not.
Thanks for using PhyloFisher!
Best, Robert
No dice. I tried both python and python3 as well.
(fisher) au706677@d46989 phylofisher % python3 forest_local.py -i sgt_constructor_out_Apr.24.2024-local.tar.gz
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/local/Cellar/python@3.10/3.10.14/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/usr/local/Cellar/python@3.10/3.10.14/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/pool.py", line 48, in mapstar
return list(map(*args))
File "/Users/au706677/Documents/AU/DeepPurple/Cryobio/Leftovers/EUKBINS/phylofisher/forest_local.py", line 197, in suspicious_clades
groups.add(metadata[org]['Higher Taxonomy'])
NameError: name 'metadata' is not defined
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/au706677/Documents/AU/DeepPurple/Cryobio/Leftovers/EUKBINS/phylofisher/forest_local.py", line 684, in <module>
suspicious = parallel_susp_clades(trees)
File "/Users/au706677/Documents/AU/DeepPurple/Cryobio/Leftovers/EUKBINS/phylofisher/forest_local.py", line 503, in parallel_susp_clades
suspicious = list(pool.map(suspicious_clades, trees))
File "/usr/local/Cellar/python@3.10/3.10.14/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/pool.py", line 367, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/usr/local/Cellar/python@3.10/3.10.14/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/pool.py", line 774, in get
raise self._value
NameError: name 'metadata' is not defined
Dir contains metadata.tsv and its a non zero content file.
Hi, @robert-ervin-jones . I have the same question in my test with the original "metadata.tsv". How to bypass multiprocessing? And my test in remote server produce no result(except for the empty dir "forest_out_M.D.Y" itself).
Local test error info as below:
python forest_local.py -i sgt_constructor_out_Apr.28.2024-local.tar.gz
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "C:\Program Files\Python38\lib\multiprocessing\pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "C:\Program Files\Python38\lib\multiprocessing\pool.py", line 48, in mapstar
return list(map(*args))
File "D:\2024\05\forest_local.py", line 197, in suspicious_clades
groups.add(metadata[org]['Higher Taxonomy'])
NameError: name 'metadata' is not defined
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "forest_local.py", line 684, in <module>
suspicious = parallel_susp_clades(trees)
File "forest_local.py", line 503, in parallel_susp_clades
suspicious = list(pool.map(suspicious_clades, trees))
File "C:\Program Files\Python38\lib\multiprocessing\pool.py", line 364, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "C:\Program Files\Python38\lib\multiprocessing\pool.py", line 771, in get
raise self._value
NameError: name 'metadata' is not defined
After I add "metadata = {}" in line 20, the info changed as below:
>python forest_local.py -i sgt_constructor_out_Apr.28.2024-local.tar.gz
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "C:\Program Files\Python38\lib\multiprocessing\pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "C:\Program Files\Python38\lib\multiprocessing\pool.py", line 48, in mapstar
return list(map(*args))
File "D:\2024\05\forest_local.py", line 198, in suspicious_clades
groups.add(metadata[org]['Higher Taxonomy'])
KeyError: 'Tisolute'
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "forest_local.py", line 671, in <module>
suspicious = parallel_susp_clades(trees)
File "forest_local.py", line 504, in parallel_susp_clades
suspicious = list(pool.map(suspicious_clades, trees))
File "C:\Program Files\Python38\lib\multiprocessing\pool.py", line 364, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "C:\Program Files\Python38\lib\multiprocessing\pool.py", line 771, in get
raise self._value
KeyError: 'Tisolute'
Then I commented out the function "def parallel_susp_clades(trees)", and changed "suspicious = parallel_susp_clades(trees)" to "suspicious = suspicious_clades(trees)", the error info changed as below:
>python forest_local.py -i sgt_constructor_out_Apr.28.2024-local.tar.gz
Traceback (most recent call last):
File "forest_local.py", line 672, in <module>
suspicious = suspicious_clades(trees)
File "forest_local.py", line 175, in suspicious_clades
t = Tree(tree)
File "C:\Program Files\Python38\lib\site-packages\ete3\coretype\tree.py", line 212, in __init__
read_newick(newick, root_node = self, format=format,
File "C:\Program Files\Python38\lib\site-packages\ete3\parser\newick.py", line 269, in read_newick
raise NewickError("'newick' argument must be either a filename or a newick string.")
ete3.parser.newick.NewickError: 'newick' argument must be either a filename or a newick string.
You may want to check other newick loading flags like 'format' or 'quoted_node_names'.
Hi, @robert-ervin-jones . I have the same question in my test with the original "metadata.tsv". How to bypass multiprocessing? And my test in remote server produce no result(except for the empty dir "forest_out_M.D.Y" itself).
Local test error info as below:
python forest_local.py -i sgt_constructor_out_Apr.28.2024-local.tar.gz multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "C:\Program Files\Python38\lib\multiprocessing\pool.py", line 125, in worker result = (True, func(*args, **kwds)) File "C:\Program Files\Python38\lib\multiprocessing\pool.py", line 48, in mapstar return list(map(*args)) File "D:\2024\05\forest_local.py", line 197, in suspicious_clades groups.add(metadata[org]['Higher Taxonomy']) NameError: name 'metadata' is not defined """ The above exception was the direct cause of the following exception: Traceback (most recent call last): File "forest_local.py", line 684, in <module> suspicious = parallel_susp_clades(trees) File "forest_local.py", line 503, in parallel_susp_clades suspicious = list(pool.map(suspicious_clades, trees)) File "C:\Program Files\Python38\lib\multiprocessing\pool.py", line 364, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "C:\Program Files\Python38\lib\multiprocessing\pool.py", line 771, in get raise self._value NameError: name 'metadata' is not defined
After I add "metadata = {}" in line 20, the info changed as below:
>python forest_local.py -i sgt_constructor_out_Apr.28.2024-local.tar.gz multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "C:\Program Files\Python38\lib\multiprocessing\pool.py", line 125, in worker result = (True, func(*args, **kwds)) File "C:\Program Files\Python38\lib\multiprocessing\pool.py", line 48, in mapstar return list(map(*args)) File "D:\2024\05\forest_local.py", line 198, in suspicious_clades groups.add(metadata[org]['Higher Taxonomy']) KeyError: 'Tisolute' """ The above exception was the direct cause of the following exception: Traceback (most recent call last): File "forest_local.py", line 671, in <module> suspicious = parallel_susp_clades(trees) File "forest_local.py", line 504, in parallel_susp_clades suspicious = list(pool.map(suspicious_clades, trees)) File "C:\Program Files\Python38\lib\multiprocessing\pool.py", line 364, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "C:\Program Files\Python38\lib\multiprocessing\pool.py", line 771, in get raise self._value KeyError: 'Tisolute'
Then I commented out the function "def parallel_susp_clades(trees)", and changed "suspicious = parallel_susp_clades(trees)" to "suspicious = suspicious_clades(trees)", the error info changed as below:
>python forest_local.py -i sgt_constructor_out_Apr.28.2024-local.tar.gz Traceback (most recent call last): File "forest_local.py", line 672, in <module> suspicious = suspicious_clades(trees) File "forest_local.py", line 175, in suspicious_clades t = Tree(tree) File "C:\Program Files\Python38\lib\site-packages\ete3\coretype\tree.py", line 212, in __init__ read_newick(newick, root_node = self, format=format, File "C:\Program Files\Python38\lib\site-packages\ete3\parser\newick.py", line 269, in read_newick raise NewickError("'newick' argument must be either a filename or a newick string.") ete3.parser.newick.NewickError: 'newick' argument must be either a filename or a newick string. You may want to check other newick loading flags like 'format' or 'quoted_node_names'.
Solved in a silly way: I successfully changed multiprocessing to a simple "for loop", and then the result came out smoothly in 1 min. To achieve this, change:
if not args.backpropagate:
suspicious = parallel_susp_clades(trees)
to
suspicious = []
if not args.backpropagate:
# suspicious = parallel_susp_clades(trees)
for tree in trees:
suspicious.append(suspicious_clades(tree))
print(suspicious)
You will see the list of suspicious genes in the corresponding tree in your terminal. Result files are as below:
Hope this helps! @robert-ervin-jones @sipesk
Hi @shuiyujinlan,
Would it be possible for you to open a PR with your proposed code changes?
Best, Robert
Had the same issue, and the fix by shuiyujinlan worked for me as well. Thanks!
Hi @shuiyujinlan,
Would it be possible for you to open a PR with your proposed code changes?
Best, Robert
Sure. I opened a PR just now. And hope you'll find some clues in my reply and description to fix it more gracefully (e.g. retain the multiprocessing function).
Hello,
Phylofisher is great and was running smoothly until i brought the sgt_construct_out.tar.gz to my local machine.
I downloaded forest_local.py and now get an error with the metadata args. I've checked the .tar.gz to make sure than the metadata.tsv are there and contain text