Open jaekor91 opened 4 years ago
@igsaber -- Thank you for uploading the classification training data! I noticed that there are >60K sequences, rather than 51,130 noted in the supplementary table. Could you explain the source of discrepancy? Could you also share observed counts for individual sequences in R2 and R3?
Hi, I also am quite confused by the data, could you upload a README please?
Hello, could you add a REAME to the
data/training data
folder with a description of what each file is for?Could you add the original training data to this directory?
https://github.com/gifford-lab/antibody-2019/tree/master/data/training%20data/Hold%20out%20classification
Also, could you explain why there are "J" in the sequences in the following (and other) file? Are they used as padding tokens for NNs? If so, are paddings added randomly?
https://raw.githubusercontent.com/gifford-lab/antibody-2019/master/data/training%20data/Full%20regression/Lucentis_b/data.tsv
Lastly, could you share the entire phagemid template sequence used? I am interested in looking at the entire antibody sequence available in addition to using the flanking sequence.
Thank you!