frankligy / DeepImmuno

Deep-learning empowered prediction and generation of immunogenic epitopes for T cell immunity
MIT License
63 stars 26 forks source link

Git Clone issue in Windows PC due to special character #4

Open awesome-crab opened 1 year ago

awesome-crab commented 1 year ago

Hello,

DeepImmuno might just be what I am looking for! Unfortunately, I cannot clone the project without error, because some file names contain special characters:

new_imgt_scraping/new_imgt/new_imgt/spiders/hla_paratope/HLA-C*0102.json

I am having issues to download the hla_paratope json files because they have an "*" in the file name.

How important are these Files in order to run the code?

If they are really important would it be a possibility to change the file names so that they dont contain any special characters?

Best regards and thank you for sharing!

frankligy commented 1 year ago

Hi @awesome-crab,

Thanks for reaching out! The whole folder /new_imgt_scraping is not required for running the program but just for reproducibility purpose as I got asked regarding how I curated all the paratope information for each HLA allele.

With that being said, I never encounter a clone error for DeepImmuno repository and I just tested on my Mac. I would love to know more about this error so I may help with debugging. Alternatively, you can download the zip file instead of using git clone or using our web server (https://deepimmuno.research.cchmc.org/) if this issue somehow persists.

(base) EA21-00327:~ ligk2e$ cd ~/Desktop
(base) EA21-00327:Desktop ligk2e$ git clone https://github.com/frankligy/DeepImmuno.git
Cloning into 'DeepImmuno'...
remote: Enumerating objects: 1882, done.
remote: Counting objects: 100% (1882/1882), done.
remote: Compressing objects: 100% (1617/1617), done.
remote: Total 1882 (delta 336), reused 1800 (delta 261), pack-reused 0
Receiving objects: 100% (1882/1882), 36.54 MiB | 8.08 MiB/s, done.
Resolving deltas: 100% (336/336), done.
(base) EA21-00327:Desktop ligk2e$ cd ./DeepImmuno/
(base) EA21-00327:DeepImmuno ligk2e$ ls
LICENSE         data            deepimmuno-gan.py   models          reproduce
README.md       deepimmuno-cnn.py   extension       new_imgt_scraping   src

Let me know if I can help with anything else!

Best, Frank

awesome-crab commented 1 year ago

Hello @frankligy,

Thank you for your reply!

Just for completeness so you see my error:

Cloning into 'DeepImmuno'... remote: Enumerating objects: 1882, done. remote: Counting objects: 100% (1882/1882), done. remote: Compressing objects: 100% (1617/1617), done. remote: Total 1882 (delta 336), reused 1800 (delta 261), pack-reused 0 Receiving objects: 100% (1882/1882), 36.54 MiB | 2.38 MiB/s, done. Resolving deltas: 100% (336/336), done. error: invalid path 'new_imgt_scraping/new_imgt/new_imgt/spiders/hla_paratope/HLA-A*0101.json' fatal: unable to checkout working tree warning: Clone succeeded, but checkout failed. You can inspect what was checked out with 'git status' and retry with 'git restore --source=HEAD :/'

The issue is infact that windows does not allow filenames to include any of:

(There are also nonprintable characters which are not allowed.)

So, beside the json files in hla_paratopes folder everything is fine with your repo.

In order to clone the project to a windows machine you need to do some magic. I was not able to do it so I just clone the code to a docker container with linux environment.

Definitely downloading a .zip is a possibility, I guess then you need to change the file names when unpacking the zip. (in windows).

Maybe another question from my side: Can you give me a tip on how I could validate DeepImmuno? We would like to do some decisions for mutation on the results of DeepImmuno, and if it would turn out to be errorprone or anything.... well would not be amazing.... I believe and hope that the code is perfect. But I would like to have prove for myself....

Best regards!

frankligy commented 1 year ago

Hi @awesome-crab

Thanks for letting me know! I indeed haven't thought about that possibility for windows, I will keep this issue open so other users can refer to it later.

For the validation question, as you may agree with me, any prediction tool can not be perfect (although I really hope so). There will always be false positive and false negative predictions associated with the results. But here are some thoughts I can share:

[1] DeepImmuno has pretty good sensitivity (we probably didn't emphasize that enough in the paper but this is something I am relatively confident about), so if a peptide is predicted to be less than 0.5, I would say the chances of this peptide being immunogenic is very low. In contrast, my own opinion is current tools (DeepImmuno along with others) are still not that good at specificity, which means if a peptide is predicted to be above 0.5, there are chances that it is a false positive.

[2] After the publication, there are a few independent benchmark studies that may give you some idea about the pros and cons of these approaches. One is this (https://www.biorxiv.org/content/10.1101/2022.03.14.484285v1.full), another is this (https://academic.oup.com/bib/article/23/3/bbac141/6573960). as I may be biased toward my own research.

[3] I also want to point out this paper (https://www.nature.com/articles/s42256-021-00383-2) where they also consider the TCR sequence when predicting reactivity. I think this is useful because without the actual TCR sequence, what we predict here is just a population-wise indicator for how likely a peptide is immunogenic or not, so this tool may provide more granular insights if you have a TCR sequence available.

[4] I am now actually conducting some immunogenicity experimental validation for another related project, eventually, this is the only way to validate that. But prediction tools can help me narrow down the space to experimentally test.

Hoping this helps and happy to clarify anything further, Frank

awesome-crab commented 1 year ago

Hello @frankligy

Thanks for pointing me to these papers! That helps me a lot!

Let me know if I should rename the title of the issue so that others are able to quickly find the issue. Or do it yourself if possible for you.

Best regards!