pSONIC run error - Githubissues

ssamberkar commented 2 years ago

HI Justin,,

After all the files and parameters set and provided in the correct way, I got the following error:

` Starting pSONIC Starting Singleton Search Initial filtering done: 2.2946099638938904 Tether Sets from OrthoFinder All Found: 2.589566469192505 multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/opt/ohpc/pub/apps/python/3.9.6/lib/python3.9/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, *kwds)) File "/opt/ohpc/pub/apps/python/3.9.6/lib/python3.9/multiprocessing/pool.py", line 48, in mapstar return list(map(args)) File "/home/ssa18/tools/pSONIC/pSONIC.py", line 321, in add_names edge.append(gene_list.index(edge[0])) ValueError: '1_21305' is not in list """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/ssa18/tools/pSONIC/pSONIC.py", line 444, in parse_args() File "/home/ssa18/tools/pSONIC/pSONIC.py", line 442, in parse_args main(args.prefix, args.orthogroups, args.threads, args.ploidy, args.sequenceIDs, args.speciesIDs) File "/home/ssa18/tools/pSONIC/pSONIC.py", line 382, in main singletons = edges_to_groups(singletons, gene_names, threads) File "/home/ssa18/tools/pSONIC/pSONIC.py", line 75, in edges_to_groups edge_list = pool.map(add_names_part, edge_list) File "/opt/ohpc/pub/apps/python/3.9.6/lib/python3.9/multiprocessing/pool.py", line 364, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/opt/ohpc/pub/apps/python/3.9.6/lib/python3.9/multiprocessing/pool.py", line 771, in get raise self._value ValueError: '1_21305' is not in list `

The command I used was:

python3 ~/tools/pSONIC/pSONIC.py rice_test -t 16 -sID SequenceIDs.txt -specID SpeciesIDs.txt -gff rice_test.gff

ssamberkar commented 2 years ago

I had run MCScanX with and without the -b flag and I still get this error.

conJUSTover commented 2 years ago

Hi Sandeep,

I think this error may be because you have two genes with the same genID in your gff filed. Would you mind running the following command and paste the results here?

grep "1_21305" SequenceIDs.txt | cut -f 2 -d " " | while read line; do grep "$line" SequenceIDs.txt; done

ssamberkar commented 2 years ago

Hey Justin,

It's a single entry:

1_21305: Os132278_Ung0000640.01

conJUSTover commented 2 years ago

Would you mind sending me your SequenceIDs.txt and rice_test.gff file that creates the error? I'm not sure what is causing this error, but I can troubleshoot it.

Also, can you ensure that you are using the latest version of pSONIC?

ssamberkar commented 2 years ago

Sure,

conJUSTover commented 2 years ago

Hmmmm, I wasn't able to recreate the error that you have.

Can you try running any of the test examples provided? I think the problem may be in your python version, and the way the multithreading module is interacting with the base python version. I've had this problem come up when I'm using a conda environment on some machines, so maybe that's a step forward?

ssamberkar commented 2 years ago

Hi Justin,

Been a while since I returned to pSONIC for my pangene analysis. I'm now using 3 rice genomes, each with their own GFFs. The first 2 steps with OrthoFinder and translate_gff worked fine. MCScanX perhaps didn't produce the .tandem files, which is now causing a roadblock for pSONIC to proceed. Here's the error:

Traceback (most recent call last): File "/home/ssa18/tools/pSONIC/pSONIC.py", line 444, in <module> parse_args() File "/home/ssa18/tools/pSONIC/pSONIC.py", line 442, in parse_args main(args.prefix, args.orthogroups, args.threads, args.ploidy, args.sequenceIDs, args.speciesIDs) File "/home/ssa18/tools/pSONIC/pSONIC.py", line 363, in main with open(prefix + ".tandem", "r") as handle: FileNotFoundError: [Errno 2] No such file or directory: 'ps_test.tandem'

How do I proceed?

Best, Sandeep

conJUSTover commented 2 years ago

Hi Sandeep,

The .tandem file produced by MCScanX is fundamental input file for pSONIC, so I can't guarantee pSONIC would work without it. My advice would be to double check that MCScanX didn't produce the file, and if not, rerun MCScanX with the command MCScanX -b 2 <prefix> where is the same prefix you will use for your final step in pSONIC. If your genomes contain ploidy variation, then run MCScanX <prefix>.

Hope this helps, and let me know if you come across any other problems!

Justin

chengyuye commented 2 years ago

Hello,

Thank you for developing such a good tool. I'm trying to use pSONIC for inferring more one-to-one orthologs between Arabidopsis_thaliana and Gossypium_hirsutum. But I encountered many errors and not sure if i've every steps correctly. I hope you can give me some help.

So, the orignal gff file i used is GCF_007990345.1_Gossypium_hirsutum_v2.1_genomic.gff and TAIR10_GFF3_genes.gff. Based on my understanding, I first extracted the first colume (e.g. Chr1 in Ara tha), fourth colume (start_position), fifth colume (end_p), and gene ID (e.g. ID=AT1G01010.1) for each gff. I then made some modifications for chromosome name and gene id and concate them together. I then tried to use pSONIC translate function but it did work. So i have to mannully extracted gff as required (of couse, i'm not 100% sure about the accuracy of the gff but have done the best), and finally I believed i have obtained the gff file as required. Then, running MSCANX was fine, and i got the .tandom and .collinearity files. Then, i ran pSONIC but it kept saying errors like:

`start runing pSONIC Starting pSONIC Starting Singleton Search Initial filtering done: 1.4827887694040933 Tether Sets from OrthoFinder All Found: 1.6325551946957906 multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/home/chengyu/miniconda3/envs/psonic/lib/python3.7/multiprocessing/pool.py", line 121, in worker result = (True, func(*args, *kwds)) File "/home/chengyu/miniconda3/envs/psonic/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar return list(map(args)) File "pSONIC.py", line 321, in add_names edge.append(gene_list.index(edge[0])) ValueError: '1_6800' is not in list """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "pSONIC.py", line 444, in parse_args() File "pSONIC.py", line 442, in parse_args main(args.prefix, args.orthogroups, args.threads, args.ploidy, args.sequenceIDs, args.speciesIDs) File "pSONIC.py", line 382, in main singletons = edges_to_groups(singletons, gene_names, threads) File "pSONIC.py", line 75, in edges_to_groups edge_list = pool.map(add_names_part, edge_list) File "/home/chengyu/miniconda3/envs/psonic/lib/python3.7/multiprocessing/pool.py", line 268, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/home/chengyu/miniconda3/envs/psonic/lib/python3.7/multiprocessing/pool.py", line 657, in get raise self._value ValueError: '1_6800' is not in list`

I've attached the .gff and SequenceID.txt files that I used. If you could point out where is wrong and give some help, that would be much appreciated. thank you! Gh_At.zip SequenceIDs.txt

Best regards

ssamberkar commented 2 years ago

Hey Justin,

Finally managed to run pSONIC on your test and a subset of my test dataset successfully. The key step was to supply MCScanX an uncompressed Blast results file.

I have another question about visualisation, but I'll close this one now! Cheers, Sandeep

sashulkaSh commented 1 year ago

I have exactly the same mistake!

ValueError: '1_31840' is not in list

grep "1_31840" SequenceIDs.txt | cut -f 2 -d " " | while read line; do grep "$line" SequenceIDs.txt; done 1_31840: dpunc_Mikado_v3.chr21G43.1

sashulkaSh commented 1 year ago

And it’s also funny that the fewer threads I use, the faster the error shows)

conJUSTover / pSONIC

pSONIC run error #7