IDR / idr0005-toret-adhesion

0 stars 1 forks source link

Flybase updates #2

Closed sbesson closed 2 years ago

sbesson commented 2 years ago

Fixes #1

sbesson commented 2 years ago

@frances with the last version of the script I get

(base) sbesson@ls30630:idr0005-toret-adhesion (flybase_updates) $ python scripts/update_gene_ids.py 
INFO:root:Found 12976 validated genes (13434 hits in total)
INFO:root:Updated 12621/12989 gene rows with 2085 gene symbol updates and 5889 gene synonym updates
INFO:root:Found 803 validated genes (1690 hits in total)
INFO:root:Updated 1634/1658 gene rows with 296 gene symbol updates and 658 gene synonym updates

The validated annotation CSVs are available at: idr0005-screenA-annotation-validated.csv idr0005-screenB-annotation-validated.csv

As discussed tomorrow, can you take a look at the new Gene Symbols (columns O vs P) to make sure the changes make sense. Once these changes are validated the two remaining actions are:

sbesson commented 2 years ago

As a bonus of the normalisation, it looks like Flybase has updated some Gene Symbols that were previously misread as dates by Excel (see https://www.theverge.com/2020/8/6/21355674/human-genes-rename-microsoft-excel-misreading-dates) e.g. Sep1 -> Septin1

sbesson commented 2 years ago

Closing for now as we need more decisions to move forward with such re-annotation. The scripts can always be re-used as a starting point in case we want to reconsider this body of work