Added a ManualDumper for the lastest Wellderly datasource (received from Chunlei; generated by Dr. Torkamani)
Added a tsv parser to the Wellderly data
Modified the original uploader accordingly
Issue#91
Applied the correct storage class to ClingenUploader, the only uploader possible to cause the problem
P.S. ClinGen data is no longer available, so the fix is NOT tested
P.S. The storage class to be applied was originally extended from EncodeLongHGVSIDStorage, which disappeared after the refactoring introduced in fixing Issue#110
Deleted EncodeLongHGVSIDStorage, whose functionality was moved to MyVariantBasicStorage
Refactored MyVariantBasicStorage (using the above DocEncoder class) and made it the new base class for other storage classes in MyVariant.info
Added a MyVariantTrimmingStorage (using the above DocEncoder class) especially for ClinVar (the only datasource so far that will have long ref/alt sequences)
The difference between MyVariantBasicStorage and MyVariantTrimmingStorage is that
MyVariantBasicStorage only encodes long _id
MyVariantTrimmingStorage encodes both long _id and long ref/alt sequences
P.S. it is a little difficult to run ClinVar locally (data size, xml parsing, etc.) so MyVariantTrimmingStorage is NOT tested on ClinVar data (but on Wellderly data)
Summary
This PR will fix the following issues:
Issue#33
ManualDumper
for the lastest Wellderly datasource (received from Chunlei; generated by Dr. Torkamani)Issue#91
ClingenUploader
, the only uploader possible to cause the problemP.S. ClinGen data is no longer available, so the fix is NOT tested
P.S. The storage class to be applied was originally extended from
EncodeLongHGVSIDStorage
, which disappeared after the refactoring introduced in fixing Issue#110Issue#110
src/tests/utils/hgvs.py
encode_long_hgvs_id()
function into the newly addedDocEncoder
as a class methodDocEncoder.encode_long_ref_alt_seq()
method to solve the issuedoc["_seqhashed"]
dictionarysrc/hub/dataload/storage.py
EncodeLongHGVSIDStorage
, whose functionality was moved toMyVariantBasicStorage
MyVariantBasicStorage
(using the aboveDocEncoder
class) and made it the new base class for other storage classes in MyVariant.infoMyVariantTrimmingStorage
(using the aboveDocEncoder
class) especially for ClinVar (the only datasource so far that will have long ref/alt sequences)MyVariantBasicStorage
andMyVariantTrimmingStorage
is thatMyVariantBasicStorage
only encodes long_id
MyVariantTrimmingStorage
encodes both long_id
and long ref/alt sequencesP.S. it is a little difficult to run ClinVar locally (data size, xml parsing, etc.) so
MyVariantTrimmingStorage
is NOT tested on ClinVar data (but on Wellderly data)Issue#115