Open yushengak47 opened 3 years ago
Hmm, this is a case that would require some modifications to fix. However I will say that the gene activity score values for two genes with the same promoter peak will be identical, so if you have a list of the sets of genes that share a promoter, you would be able to add in the appropriate rows.
I will leave this open and hopefully find time to find a solution in the future.
Hi,
I found that, when a peak overlaps with the promoter of two or more genes, the default settings of annotate_cds_by_site only record one of them in the 'gene' column of fData(input_cds). As a result, some genes are missing in the gene activity matrix. I have tried to set all = T when running annotate_cds_by_site, this indeed list multiple gene names in the 'gene' column. However, it seems that build_gene_activity_matrix doesn't handle it properly. The generated matrix might be redundant and problematic, for example, it has rows named "HES2,HES2,HES2,HES2", "ESPN,ESPN,HES2", et. al.
Any idea for solving the problem?
Thanks