chanzuckerberg / shasta

[MOVED] Moved to paoloshasta/shasta. De novo assembly from Oxford Nanopore reads
Other
272 stars 59 forks source link

Add script to compare phasing with MarginPhase #262

Closed rlorigro closed 3 years ago

rlorigro commented 3 years ago

This script takes 2 files containing phased read names from Pepper-Marginphase-DeepVariant and compares them to the phasing described in Shasta's Phasing.csv. Some summary stats are generated, and each read which is in disagreement between the 2 phase datasets is written to a CSV.

Example stdout:

Evaluating component: 5
Shasta phase counts: [368, 397]
Margin phase counts: [398, 367]
Confusion matrix:
[1, 367]
[397, 0]
ARI: 0.9947712416020529

(where ARI = adjusted rand index)

Example CSV log:

OrientedReadId,ReadName,Component,ShastaPhase,MarginPhase
798-1,39be7cb7-4ea1-4d3d-8135-0ebd50e895de,0,0,1
3327-1,6315035e-25d3-4fa6-8432-fcb5525024af,0,0,1
194-1,c853b653-c9aa-4af5-bfa6-c71430e8f018,0,0,1
2861-1,ce91af8d-7f4c-4d65-85dc-de7c90c31889,0,0,1
1823-1,b30e612d-4bf6-4b7a-9540-f9ee16ce22db,0,0,1
2831-1,6d8a8961-c1ff-4244-bf6f-f17a934b1a67,0,0,1
3627-1,b775fb67-fa1b-4586-b156-e28c9d799a12,0,0,1
2083-1,7de64bc4-cbd4-4265-8e5d-5963c408e6d8,0,0,1
3567-1,fa9fe977-d864-4532-9a6d-4af95a3cee22,0,1,0
798-1,39be7cb7-4ea1-4d3d-8135-0ebd50e895de,1,0,1
3327-1,6315035e-25d3-4fa6-8432-fcb5525024af,1,0,1
194-1,c853b653-c9aa-4af5-bfa6-c71430e8f018,1,0,1
2861-1,ce91af8d-7f4c-4d65-85dc-de7c90c31889,1,0,1