WormBase / pseudoace

Modelling the WormBase ACeDB database in datomic.
4 stars 3 forks source link

Possibly speed up migration? #72

Closed mgrbyte closed 7 years ago

mgrbyte commented 7 years ago

@khowe @epaule @gw3 @Paul-Davis

I'm wondering if excluding certain classes (e.g View and UserSession) could speed up the migration.

The following are the classes dumped by the ACeDB tace command during the migration:

Accession_number
Analysis
Anatomy_function
Anatomy_name
Anatomy_term
Antibody
AO_code
Author
CDS
Cell
Cell_group
Clone
Condition
Construct
Contig
Database
Database_field
DNA
DO_term
2_point_data
Expression_cluster
Expr_pattern
Expr_profile
Feature
Feature_data
Gene
Gene_class
Gene_cluster
Gene_name
Genetic_code
GO_annotation
GO_code
GO_term
Grid
Homol_data
Homology_group
Interaction
Jade
KeySet
Laboratory
Library
Life_stage
Locus
LongText
Map
Mass_spec_experiment
Mass_spec_peptide
MatchTable
Method
Microarray
Microarray_experiment
Microarray_results
Molecule
Motif
Movie
Multi_pt_data
Oligo
Oligo_set
Operon
Paper
PATO_term
PCR_product
Peptide
Person
Person_name
Phenotype
Phenotype_name
Picture
Position_Matrix
Pos_neg_data
Protein
Pseudogene
Rearrangement
Reconstruction
RNAi
SAGE_experiment
SAGE_tag
Sequence
Sequence_collection
SK_map
SO_term
Species
Strain
Structure_data
Table
TableResult
Transcript
Transcription_factor
Transgene
Transposon
Transposon_family
Tree
TreeNode
UserSession
Variation
Variation_name
View
WBProcess
mgrbyte commented 7 years ago

These are the tace options used.

epaule commented 7 years ago

View are 24 quite small objects. The UserSessions in contrast are 47k entries, but to be fair, also quite small.

I don't think we should import them into datomic, as the user should hang of the tmestamps, so it is redundant to track the sessions.

And the View are xace/genetic map specific, so can go too. The whole multi-point data thing could be stored in simpler ways.

epaule commented 7 years ago

TableResult and Life look a tad suspicious

and Keyset are another ace specific class that could potentially go, depending on if we want to preserve the functionality in datomic or not.

mgrbyte commented 7 years ago

My ~grep~ sed command was slightly off. I've updated the listing of classes above.

mgrbyte commented 7 years ago

Thanks all, closing this for now as I think there are many of things of higher priority. NB. Clases not mentioned in the annotated schema used to drive the import process will have any data migrated into the datomic db.