Knowledge-Graph-Hub / automate-pheno-comparisons

Jenkins-based automation of phenotype semantic similarity on PHENIO with Semsimian.
BSD 3-Clause "New" or "Revised" License
2 stars 0 forks source link

Determine whether PHENIO edges are necessary in calculating HP vs HP semsim #31

Closed caufieldjh closed 4 months ago

caufieldjh commented 5 months ago

As per discussion w/ @justaddcoffee and @julesjacobsen, the most recent build prepares semsim values for both of the following:

So how do they differ? Below, I will refer to HP vs HP, through PHENIO as "PHENIO" and HP vs HP, through HP alone as "Alone".

Compressed size:

Uncompressed size of semsim table alone:

justaddcoffee commented 5 months ago

Uncompressed size of semsim table alone:

PHENIO: ~4.2 Gb, 22332765 lines Alone: ~4.3 Gb, 23580617 lines

This might seem weird, but is plausible: using HP alone possibly produces more pairs of HP terms that meet the IC cutoff, so the file is bigger

caufieldjh commented 5 months ago

See also https://github.com/Knowledge-Graph-Hub/automate-pheno-comparisons/issues/30

caufieldjh commented 5 months ago

URLs for the above:

caufieldjh commented 4 months ago

@julesjacobsen reports that the similarity tables with PHENIO edges result in better performance in Exomizer tests than the HP vs self similarities alone.