gagneurlab / MMSplice_MTSplice

Tissue-specific variant effect predictions on splicing
MIT License
39 stars 21 forks source link

writeVCF not implemented. I wrote some implementation #32

Closed beskns closed 3 years ago

beskns commented 4 years ago

columns = [ 'gene_name', 'transcript_id', 'exons', 'ref_exon', 'alt_exon', 'ref_donor', 'alt_donor', 'ref_acceptor', 'alt_acceptor', 'ref_acceptorIntron', 'alt_acceptorIntron', 'ref_donorIntron', 'alt_donorIntron', 'delta_logit_psi', 'pathogenicity', 'efficiency' ]

def writeVCF(vcf_in, vcf_out, predictions): from cyvcf2 import VCF, Writer vcf = VCF(vcf_in) vcf.add_info_to_header({ 'ID': 'mmsplice', 'Description': 'MMSplice splice variant effect. Format:' + '|'.join(columns), 'Type': 'Character', 'Number': '.' }) w = Writer(vcf_out, vcf)

for var in vcf:
    ID = f"{var.CHROM}:{var.POS}:{var.REF}:{var.ALT}"
    pred = predictions[predictions.ID == ID]
    if pred is not None:
        pred_4_var = [
            '|'.join([row[k] for k in columns[:3]]) + '|' +
            '|'.join([format(row[k], ".3f") for k in columns[3:]])
            for ind, row in pred.iterrows()
        ]
        var.INFO['mmsplice'] = '&'.join(pred_4_var)
    w.write_record(var)
MuhammedHasan commented 4 years ago

Thanks for your contribution but writeVCF is already implemented.

https://github.com/gagneurlab/MMSplice/blob/7f4aeb8bfa6cd460bccc5db593c066d1691bf1f6/mmsplice/mmsplice.py#L193

If you think this implementation is lacking some of the features, please report them. We can improve the implementation.

beskns commented 4 years ago

It is only stub. Not working! 25.09.2019, 19:48, "Muhammed Hasan" notifications@github.com:Thanks for your contribution but writeVCF is already implemented. https://github.com/gagneurlab/MMSplice/blob/7f4aeb8bfa6cd460bccc5db593c066d1691bf1f6/mmsplice/mmsplice.py#L193

—You are receiving this because you authored the thread.Reply to this email directly, view it on GitHub, or mute the thread. -- С уважением,Н.С. Бескоровайный

s6juncheng commented 4 years ago

thanks @beskns. @MuhammedHasan maybe we put it in the utils.py and import to the first level in the init.py file for the next release.

tstohn commented 3 years ago

Are there any news regarding the WriteVcf functionality. We would also be thrilled to use it in the MedGen in Tübingen. Thanks, Tim

s6juncheng commented 3 years ago

This function is now implemented in mmsplice.utils.writeVCF https://github.com/gagneurlab/MMSplice_MTSplice/blob/dd91265d4eafdadaa75990dd425af5b583aed101/mmsplice/utils.py#L353

New version is on pypi and can be installed with pip.

Thanks @beskns for sharing your implementation. I did some modifications based on that.

After writing the predictions as vcf file, you can read the output file into a pandas DataFrame with mmsplice.utils.read_vep https://github.com/gagneurlab/MMSplice_MTSplice/blob/dd91265d4eafdadaa75990dd425af5b583aed101/mmsplice/utils.py#L164

Hopefully, this works for your use case @tstohn.

I close the issue for now, please feel free to reopen if there is further question or request.

tstohn commented 3 years ago

Hey Jun, Thanks a lot. That works for me. Only thing I was wondering was, that i was getting a segfault when writing a variant list, for which some variants have no mmsplice prediction. I noticed it to be due to 'pred' in line 384 in utils.py beeing an empty DataFrame, which holds no values, which are then accessed in line 387. In case you can not reproduce this let me know and I ll have a deeper look into it on my machine. Thanks again & Cheers, Tim