Closed TommyH-Tran closed 1 year ago
Hi @TommyH-Tran , can you please share the LG180_f0_0taxa_algCOG_e0C90.cluster_list and LG180_f0_0taxa_algOMCL_e0C90.cluster_list files? It would also help to see the output of the ls command at LG180_f0_0taxa_algCOG_e0C90 and LG180_f0_0taxa_algOMCL_e0C90 respectively, thanks, Bruno
I cd
into both of those directories and then did ls
and the gene cluster files show up in the terminal
Here are the cluster_list files: LG180_f0_0taxa_algCOG_e0C90.cluster_list.zip LG180_f0_0taxa_algOMCL_e0C90.cluster_list.zip LG180_f0_19taxa_algBDBH_e0C90.cluster_list.zip
Sorry @TommyH-Tran , cannot see what's wrong so far. Would it be possible for you to send me the folders LG180_f0_0taxa_algCOG_e0C90 , LG180_f0_0taxa_algOMCL_e0C90 and LG180_f0_19taxa_algBDBH_e0C90 compressed? Thanks, Bruno
Here are the folders: LG180_f0_0taxa_algCOG_e0C90.zip LG180_f0_0taxa_algOMCL_e0C90.zip LG180_f0_19taxa_algBDBH_e0C90.zip
Thanks @TommyH-Tran , it seems the reason for this output is the optional argument -t 19, which requires clusters in the intersection to contain exactly 19 sequences (single-copy) from 19 taxa. As you can see below, there are none among OCML / COG clusters:
grep -c "size=19 taxa=19" LG180_f0_19taxa_algBDBH_e0_C90_.cluster_list
270
grep -c "size=19 taxa=19" LG180_f0_0taxa_algCOG_e0_C90_.cluster_list
0
grep -c "size=19 taxa=19" LG180_f0_0taxa_algOMCL_e0_C90_.cluster_list
0
If you remove it the resulting intersection contains 144 clusters. Hope this helps, Bruno
Yes, I want the intersection of single copy clusters from 19 taxa from each of the three algorithms. How is it possible in the pangenome -t 0, there are no single copy clusters identified within the COG and OMCL clusters? This is how I have usually done it and it has worked.
Is 144 the single copy clusters among all three? Or will it be fixed if i run the COG and OMCL with -t 19 option and then try to create the intersection?
Hi @TommyH-Tran , 144 are shared clusters, not single-copy. Options I can see:
Could I not run COG and OMCL like this to get single copy and then do the ./compare_clusters.pl
? Or are you suggesting instead I add the -S and -e ontop of those two runs?
./get_homologues.pl -d "/Users/klemonlab/Desktop/THT/THT_NACH/NACH_gbk_19" -n 8 -t 19 -C 90 -G
./get_homologues.pl -d "/Users/klemonlab/Desktop/THT/THT_NACH/NACH_gbk_19" -n 8 -t 19 -C 90 -M
This is what I suggest, if 90% identity is reasonable for your analysis:
./get_homologues.pl -d "/Users/klemonlab/Desktop/THT/THT_NACH/NACH_gbk_19" -n 8 -t 19 -C 90 -G -S 90 -e
./get_homologues.pl -d "/Users/klemonlab/Desktop/THT/THT_NACH/NACH_gbk_19" -n 8 -t 19 -C 90 -M -S 90 -e
I tried using what you suggested and still get 0 clusters. Is 90% identity too rigid?
It's probably too lenient. If you want single copy-clusters you need to separate the divergent copies, so you should increase it even more to see if that works, Bruno
Sorry for the delay, I set it at 99 limit and it still gave me back 0 clusters using the flags you suggested...
./get_homologues.pl -d "/Users/klemonlab/Desktop/THT/THT_NACH/NACH_gbk_20" -n 8 -t 20 -C 99 -G -S 99 -e
At this point I guess you should inspect the pangenome matrix to see whether it is always the same genome that has values > 1 in all clusters, or whether all genomes behave like that
I can not create a intersection using
./compare_clusters.pl
I have checked inside the folders and the .faa and .fna gene clusters are present. Then the error said to review the duplicated.cluster_list file and it is completely blank.Here is the output: