bacpop / PopPUNK

PopPUNK 👨‍🎤 (POPulation Partitioning Using Nucleotide Kmers)
https://www.bacpop.org/poppunk
Apache License 2.0
88 stars 18 forks source link

Subsetting the database when using poppunk_visualise --cytoscape with --include-files #196

Closed muppi1993 closed 2 years ago

muppi1993 commented 2 years ago

Versions

poppunk 2.4.0 poppunk_sketch 1.7.4

Command used and output returned

poppunk_visualise --ref-db GPS_v4 --query-db poppunk_clusters --cytoscape --output example_cytoscape --tree none --include-files gps_cluster3_list.txt --network-file GPS_v4/GPS_v4_graph.gt 

Output: one .graphml and two .csv files

Describe the bug

I tried to only include a subset of the dataset in the output with --include-files, which worked fine for the --microreact output. However, the .graphml network contains all isolates from the database rather than just those listed in gps_cluster3_list.txt.

johnlees commented 2 years ago

This should be done by this bit of code which masks the not-included nodes: https://github.com/johnlees/PopPUNK/blob/46aff5d5715b26a7582c733d6956cb4c78748a99/PopPUNK/plot.py#L488-L495

But maybe when we print the graphml it prints the masked nodes too? @nickjcroucher do you remember if this is the case?

nickjcroucher commented 2 years ago

All gets saved with the same function - masking can be a little tricky in graph-tool. Will take a look.

nickjcroucher commented 2 years ago

Also @muppi1993 highlighted that running with just --cytoscape still generates a tree, which is not needed and quite slow - I think we should change this default, unless there any objections?

johnlees commented 2 years ago

All gets saved with the same function - masking can be a little tricky in graph-tool. Will take a look.

I think we just need a GraphView (adding in #204)

Also @muppi1993 highlighted that running with just --cytoscape still generates a tree, which is not needed and quite slow - I think we should change this default, unless there any objections?

Agree, will also add in 2.5.0

sydelstan commented 4 months ago

poppunk_visualise --ref-db poppunk_clusters --output cytoscape_5 --cytoscape --network-file /poppunk_clusters/poppunk_clusters_refs_graph.gt --include-files strains.csv --external-clustering meta.csv

I have the same issue when running this code -- several strains from the reference database are still included in the final cytoscape output even though it should just include the strains from the query/strain list

@johnlees @johnlees