EBISPOT / scxa_2_cxg

Apache License 2.0
1 stars 0 forks source link

Extend CxG conversion to cover - `var` (genes), #7

Closed dosumis closed 4 months ago

anitacaron commented 5 months ago

The CxG scheme, only defines 5 columns: feature_is_filtered, feature_biotype, feature_length, feature_name and feature_reference I've checked the columns available in the var in SCXA:

In the CxG schema, the feature_name should be the Ensembl name. However, in some scxa experiments, there are Ensembl, Ensembl Havana, Havana, and RefSeq values, for instance, 'AC242953.1', 'Mt-nd1', 'Mt-nd2', 'Mt-co1', 'AABR07000398.1', etc. Should I remove the rows that are not Ensembl?

I didn't find other mappings.

anitacaron commented 5 months ago

cc @dosumis

anitacaron commented 4 months ago

The only mandatory column is feature_is_filtered. I put false to all.

dosumis commented 4 months ago

My understanding is that we should use Ensembl names and IDs. Where available, can you set the index (ID) and name to Ensembl?

anitacaron commented 4 months ago

They're mixed in the rows. There isn't a column for each value. The index is most of ENSMUSG.