gifford-lab / antibody-2019

32 stars 13 forks source link

README for training data? #1

Open jaekor91 opened 4 years ago

jaekor91 commented 4 years ago

Hello, could you add a REAME to the data/training data folder with a description of what each file is for?

Could you add the original training data to this directory?

https://github.com/gifford-lab/antibody-2019/tree/master/data/training%20data/Hold%20out%20classification

Also, could you explain why there are "J" in the sequences in the following (and other) file? Are they used as padding tokens for NNs? If so, are paddings added randomly?

https://raw.githubusercontent.com/gifford-lab/antibody-2019/master/data/training%20data/Full%20regression/Lucentis_b/data.tsv

Lastly, could you share the entire phagemid template sequence used? I am interested in looking at the entire antibody sequence available in addition to using the flanking sequence.

Thank you!

jaekor91 commented 4 years ago

@igsaber -- Thank you for uploading the classification training data! I noticed that there are >60K sequences, rather than 51,130 noted in the supplementary table. Could you explain the source of discrepancy? Could you also share observed counts for individual sequences in R2 and R3?

wjs20 commented 4 years ago

Hi, I also am quite confused by the data, could you upload a README please?