art-egorov / lovis4u

Bioinformatics tool for Locus Visualisation
Do What The F*ck You Want To Public License
36 stars 0 forks source link

Unable to cluster loci sequences #5

Open mpgriesh opened 3 hours ago

mpgriesh commented 3 hours ago

It's unclear why clustering might fail given very similar input plasmid sequences which makes it very difficult to troubleshoot. Can you help me understand why this might happen?

lovis4u -gff plasmid_gffs/ -o plasmid_lovis/ -hl --reorient_loci ⦿ 85 loci were loaded from extended gff files folder ○ Running mmseqs for protein clustering... ⦿ 449 clusters for 3267 proteins were found with mmseqs mmseqs clustering results were saved to plasmid_lovis/mmseqs/mmseqs_clustering.tsv lovis4uError 💔: Unable to cluster loci sequences.

art-egorov commented 3 hours ago

Could you please provide the error message you get with added '-debug' parameter? I'll have a look tomorrow morning

Best

mpgriesh commented 2 hours ago

Thanks for the fast response and tool. Absolutely love it!

With a smaller example, it runs without error because the contig headers in the gffs were unique. Non-unique headers in the bigger set cause this error:

Traceback (most recent call last): File "/labs/asbhatt/mpgriesh/tools/miniconda3/lib/python3.12/site-packages/lovis4u/DataProcessing.py", line 735, in cluster_sequences self.locus_annotation.loc[locus.seq_id, "group"] = locus.group


  File "/labs/asbhatt/mpgriesh/tools/miniconda3/lib/python3.12/site-packages/pandas/core/indexing.py", line 911, in __setitem__
    iloc._setitem_with_indexer(indexer, value, self.name)
  File "/labs/asbhatt/mpgriesh/tools/miniconda3/lib/python3.12/site-packages/pandas/core/indexing.py", line 1944, in _setitem_with_indexer
    self._setitem_single_block(indexer, value, name)
  File "/labs/asbhatt/mpgriesh/tools/miniconda3/lib/python3.12/site-packages/pandas/core/indexing.py", line 2189, in _setitem_single_block
    value = self._align_series(indexer, Series(value))
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/labs/asbhatt/mpgriesh/tools/miniconda3/lib/python3.12/site-packages/pandas/core/indexing.py", line 2455, in _align_series
    raise ValueError("Incompatible indexer with Series")
ValueError: Incompatible indexer with Series

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/labs/asbhatt/mpgriesh/tools/miniconda3/bin/lovis4u", line 26, in <module>
    loci.cluster_sequences(mmseqs_clustering_results, one_cluster= parameters.args["one_cluster"])
  File "/labs/asbhatt/mpgriesh/tools/miniconda3/lib/python3.12/site-packages/lovis4u/DataProcessing.py", line 762, in cluster_sequences
    raise lovis4u.Manager.lovis4uError("Unable to cluster loci sequences.") from error
lovis4u.Manager.lovis4uError: Unable to cluster loci sequences.

I expected the track names to be dependent on the file name rather than the contig headers. Updating the contig headers fixed that issue.

For a small test set of plasmids, I see homologous sequences are rotated relative to each other based on the assembler arbitrarily setting the coordinates. Is there a way within this tool to rotate tracks to best align homologous sequences? Sorry if I missed that...