TCP-Lab / SeqLoader

Constructors and methods for `xSeries` and `xModel` S3 classes
0 stars 0 forks source link

Implement a Model-level procedure for gene annotation synthesis #2

Open Feat-FeAR opened 3 months ago

Feat-FeAR commented 3 months ago

At the Model level (i.e., in geneStats.xModel) we have to consider the chance that the genomes considered in the different Series (studies) may not have the same size or exactly the same elements (e.g., because of different releases of the assembly), and for this there exists the Meta-analysis Inclusion Criterium (maic) option. But, likewise, we cannot be sure that the annotations that possibly come with the different Series are completely overlapping (even if using maic = exclusive). While it is always possible to re-annotate from scratch the final table of the summary descriptive statistics based on ENS IDs, in order to keep the original annotative information (though eliminating the redundancy that would make the final table unreadable) it is necessary to implement an annotation synthesis procedure. Here is a proposal:

  1. Merge annotations from all the series by gene IDs (keeping everything);
    model |> lapply(\(series) series$annotaton) |>
        Reduce(\(x, y) merge(x, y, by = "IDs", all = TRUE), x=_)
  2. then consider all columns with the same names and collapse by , unique entries associated with the same ENS ID;
  3. finally merge this global annotation with xModel_stats.

Also include this point into the documentation (README.md).