Open mpgriesh opened 3 hours ago
Could you please provide the error message you get with added '-debug' parameter? I'll have a look tomorrow morning
Best
Thanks for the fast response and tool. Absolutely love it!
With a smaller example, it runs without error because the contig headers in the gffs were unique. Non-unique headers in the bigger set cause this error:
Traceback (most recent call last): File "/labs/asbhatt/mpgriesh/tools/miniconda3/lib/python3.12/site-packages/lovis4u/DataProcessing.py", line 735, in cluster_sequences self.locus_annotation.loc[locus.seq_id, "group"] = locus.group
File "/labs/asbhatt/mpgriesh/tools/miniconda3/lib/python3.12/site-packages/pandas/core/indexing.py", line 911, in __setitem__
iloc._setitem_with_indexer(indexer, value, self.name)
File "/labs/asbhatt/mpgriesh/tools/miniconda3/lib/python3.12/site-packages/pandas/core/indexing.py", line 1944, in _setitem_with_indexer
self._setitem_single_block(indexer, value, name)
File "/labs/asbhatt/mpgriesh/tools/miniconda3/lib/python3.12/site-packages/pandas/core/indexing.py", line 2189, in _setitem_single_block
value = self._align_series(indexer, Series(value))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/labs/asbhatt/mpgriesh/tools/miniconda3/lib/python3.12/site-packages/pandas/core/indexing.py", line 2455, in _align_series
raise ValueError("Incompatible indexer with Series")
ValueError: Incompatible indexer with Series
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/labs/asbhatt/mpgriesh/tools/miniconda3/bin/lovis4u", line 26, in <module>
loci.cluster_sequences(mmseqs_clustering_results, one_cluster= parameters.args["one_cluster"])
File "/labs/asbhatt/mpgriesh/tools/miniconda3/lib/python3.12/site-packages/lovis4u/DataProcessing.py", line 762, in cluster_sequences
raise lovis4u.Manager.lovis4uError("Unable to cluster loci sequences.") from error
lovis4u.Manager.lovis4uError: Unable to cluster loci sequences.
I expected the track names to be dependent on the file name rather than the contig headers. Updating the contig headers fixed that issue.
For a small test set of plasmids, I see homologous sequences are rotated relative to each other based on the assembler arbitrarily setting the coordinates. Is there a way within this tool to rotate tracks to best align homologous sequences? Sorry if I missed that...
It's unclear why clustering might fail given very similar input plasmid sequences which makes it very difficult to troubleshoot. Can you help me understand why this might happen?
lovis4u -gff plasmid_gffs/ -o plasmid_lovis/ -hl --reorient_loci ⦿ 85 loci were loaded from extended gff files folder ○ Running mmseqs for protein clustering... ⦿ 449 clusters for 3267 proteins were found with mmseqs mmseqs clustering results were saved to plasmid_lovis/mmseqs/mmseqs_clustering.tsv lovis4uError 💔: Unable to cluster loci sequences.