Open ghost opened 2 years ago
Hello @HarukiNakamura,
No worries. Thanks for finding issues and providing suggestions to TransPi. We appreciate it.
You are right, the last column will cause issues since the name has a comma and SOS_busco.py
will fail. I think the easiest solution is what you suggested. I will do a test and modify the code. Thanks!
Best, Ramón
Pinging @n-conci
this works:
1517 cat full_table_*.tsv | grep -v "#" | tr "\t" "," | cut -d ',' -f1-5 >.busco_names.txt
1591 cat $transpi_tsv | grep -v "#" | tr "\t" "," | cut -d ',' -f1-5 >>$all_busco
Hi, I apologize for my frequent contacts.
When the runninfg of SOS_busco.py in process busco4_dist, I got following error,
I think this is a problem for SOS_busco.py input file(In my case, Read_R_all_busco4.tsv). Most of lines of my Read_R_all_busco4.tsv have 6 commas (7 columns), like this.
0at38820,Duplicated,SOAP.k25.scaffold27258,8202.3,4167,https://www.orthodb.org/v10?query=0at38820,sacsin
However, some lines of my file have 7 or 8 commas ( 8 or 9 columns) like this.
121at38820,Complete,SOAP.k25.scaffold11722,3027.5,1446,https://www.orthodb.org/v10?query=121at38820,Zinc finger, RING-type
I think that this difference in the number of commas (columns) is the cause of this pandas error.SOS_busco.py doesn't seem to use columns 6 onwards in the input file. If so, we can remove columns 6 onwards before SOS_busco.py. https://github.com/PalMuc/TransPi/blob/899d16028e2d84e746c8c0dda1c6ba9ebcca050e/TransPi.nf#L1591-L1592
This is an example of my suggestion for revising.
I hope this helps you. Thank you.