Open XiangweiZhai opened 1 year ago
Hi @XiangweiZhai - This is most likely expected behavior of GLIPH2. As discussed in the original GLIPH publication: "In this new version, first, a TCR can be assigned to more than one cluster." Two tests you could try to confirm this are:
turboGliph::turbo_gliph()
function - This is an implementation of GLIPH1, which doesn't allow TCRs to be found in multiple clusters. If your multi-cluster TCRs turn into single-cluster TCRs, that is expected behavior
Hi Thank you so much for developing this comprehensive and efficient package! In my understanding, if a few CDR3b sequences are assigned to the same cluster and are labeled the identical tag, it is because these CDR3b sequences have similar protein structures and can recognize the same antigen. So, it is impossible for a given sequence to have multiple structures appearing in different clusters. turbo_gliph()'s results are in line with my ideas, but gliph2()'s results are unexpected.
res_gliph2 <- turboGliph::gliph2(cdr3_sequences = gliph_input_data, n_cores = 10)
gliph2Properties=res_gliph2$cluster_properties
seqMatch=str_detect(gliph2Properties$members,"CANSPTSSTTSYEQYF")
gliph2Properties[seqMatch,] %>% select(type,tag,cluster_size,members)
Why the one "CANSPTSSTTSYEQYF" appear in 15 different cluster and have corresponding tags? The same thing happens in sequences:"CNARGQAITEKLFF","CASSPWGQTASSYNEQFF","CASSIRSAYEQYF"......