KohlbacherLab / epytope

Python-based framework for computational immunomics
http://fred-2.github.io/
BSD 3-Clause "New" or "Revised" License
14 stars 7 forks source link

Notebooks + doc require porting to Python 3 #12

Closed lkuchenb closed 2 years ago

lkuchenb commented 3 years ago

The notebooks are still Python 2 and won't run with the ported lib. The docs probably also have some Python 2 specific examples.

antschum commented 3 years ago

Notes on Epytope Tutorials

1. CleavageAndTAPPrediction

Cell 4: The IDs and predicted of PCM differ from the ones in the legacy notebook. I find no IDs which match the one featured of the legacy notebook. I guess the names or data changed, but only by a tiny amount. (eg. -1.93 vs -1.77)

Cell 5: Again, marginally different values and ID compared to the legacy tutorial.

Cell 8: CleavageSitePredictorFactory using proteasmm_c returns marginally different values (eg. -1.93 vs -1.77)

Cell 11: Changed svmtap to smmtap, since svmtap is no longer available.

Cell 13: Legacy demonstrated the function filter_result with [("svmtap",ge, -30)]. I updated svmtap to smmtap and changed to filter greater 1 since the values a mostly in the range of -1 and 1, but maybe another value makes even more sense.

Cell 14: Legacy again used svmtap for the predictions here, changed it to smmtap. Choose a sensible threshold for smmtap so that the number of peptides after TAP transport changes. Preliminarily changed -30 to 1 but I am not sure if we want the highest scores for smmtap. Also, I don’t have UniTope installed, so this part is commented out and should be run again if UniTope wanted.

Cell 15: Legacy again used svmtap for the predictions here, changed it to smmtap. Choose a sensible threshold for smmtap so that the number of peptides after TAP transport changes. Preliminarily changed -30 to 1 but I am not sure if we want the highest scores for smmtap. I commented out the epitope prediction and filtering using SVMHC since this is also no longer available. I was not sure which alternative method makes sense.

2. DBAdapter Usage

This tutorial is fine.

3. Epitope Prediction

Cell 4: read_fasta with type protein now returns the reference number contained in the file eg. “NP_852610.1” instead of as shown in legacy “Protein_0”; this is probably defined in Core/Protein

Cell 6: There are a lot more methods listed, this might depend on what methods I had installed, so it might make sense to rerun this with only the necessary methods installed. The following methods are missing compared to legacy: unitope 1.0 (not installed from me.) and svmhc (we removed this one.)

Cell 7: results.head shows different examples but the values are the same.

Cell 12: Returns NaNs for all methods (bimas, sim, syfpeithi) for me.

Cell 13: Filter for a meaningful method and value. Updated filter to syfpeithi (was svmhc before, which was removed), but due to all values being NaN, returns empty table.

Cell 15: The value 1.0 is predicted for all while legacy returns the value 0 for (L, L, G, A, T, C, M, F, V) and (S, Y, F, P, E, I, T, H, I).

4. GeneratorUsage

Cell 5: Presents a user warning. All other values match.

Cell 7: Presents “Biopython Warning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add training N before translation. This may become an error in future.”

5. HLA Typing

Cell 2: Cannot run cell successfully because OptiType is not installed.

6. Implementing new Methods

Nothing to run in this tutorial.

7. Installation

I believe I did not update anything except for the name here.

8. Polymorphic Epitope Prediction

Cell 4: Warning that it cannot find transcript NM_001293557 and that the reference number did not match ref to assigned variant

Cell 5: Warning that it cannot find transcript NM_001293557.

9. Vaccine Design

Cell 3: BIMAS is not working for me: “No predictions could be made with bimas for given input. Check your epitope length and HLA allele combination.”

Could not run through the remaining cells because Optitope is not installed.

jonasscheid commented 2 years ago

I checked the issues Antonia addressed by comparing the upcoming PR #42 against the legacy branch . I'll go over them step by step: 1. CleavageAndTAPPrediction

Cell 4: The IDs and predicted of PCM differ from the ones in the legacy notebook. I find no IDs which match the one featured of the legacy notebook. I guess the names or data changed, but only by a tiny amount. (eg. -1.93 vs -1.77) Discrepancy probably happened after the update of the proteins.fasta input file. The current output makes more sense, because the right ID and Seqence is imported.

Cell 5: Again, marginally different values and ID compared to the legacy tutorial. See comment above

_Cell 8: CleavageSitePredictorFactory using proteasmmc returns marginally different values (eg. -1.93 vs -1.77) I don't observe that

_Cell 13: Legacy demonstrated the function filterresult with [("svmtap",ge, -30)]. I updated svmtap to smmtap and changed to filter greater 1 since the values a mostly in the range of -1 and 1, but maybe another value makes even more sense. That is to be discussed. I arbitrarily inserted a filter number of 1 as Antonia suggested, but since it's a tutorial it is fine I guess.

Cell 14: Legacy again used svmtap for the predictions here, changed it to smmtap. Choose a sensible threshold for smmtap so that the number of peptides after TAP transport changes. Preliminarily changed -30 to 1 but I am not sure if we want the highest scores for smmtap. Also, I don’t have UniTope installed, so this part is commented out and should be run again if UniTope wanted. See comment above. UniTope is not supported anymore. Replaced by smm

Cell 14: see comment above

Cell 15: Adjusted to new EpitopePredictionResult structure

3. Epitope Prediction _Cell 4: read_fasta with type protein now returns the reference number contained in the file eg. “NP_852610.1” instead of as shown in legacy “Protein0”; this is probably defined in Core/Protein Yes it is. Since the proteins.fasta file changed (as stated above) that changed as well.

Cell 6: There are a lot more methods listed, this might depend on what methods I had installed, so it might make sense to rerun this with only the necessary methods installed. The following methods are missing compared to legacy: unitope 1.0 (not installed from me.) and svmhc (we removed this one.) New tools have been added eg netmhc family, therefore that is fine

Cell 12: Returns NaNs for all methods (bimas, sim, syfpeithi) for me. Not for me. The values are identical to the legacy branch

Cell 13: Filter for a meaningful method and value. Updated filter to syfpeithi (was svmhc before, which was removed), but due to all values being NaN, returns empty table. See comment above

Cell 15: The value 1.0 is predicted for all while legacy returns the value 0 for (L, L, G, A, T, C, M, F, V) and (S, Y, F, P, E, I, T, H, I) It's random right

5. HLA Typing Cannot execute this, because the pip way of installing Optitype is not working anymore (since pip doesn't support python2 anymore). Maybe I'm wrong here

9. Vaccine Design Cell 3: BIMAS is not working for me: “No predictions could be made with bimas for given input. Check your epitope length and HLA allele combination.” Worked for me

The tutorials have been updated accordingly in the PR #42

christopher-mohr commented 2 years ago

Thanks @antschum and @jonasscheid! I will add comments to PR #42.

christopher-mohr commented 2 years ago

@jonasscheid What is missing here?