repcred crashes with non-negative sequence length error.

bcorrie commented 2 months ago

Non negative sequence length (???) on this repertoire:

Study ID: PRJNA633317 Study group: Control (Healthy) Subject ID: W Sample ID: W_2018 PCR target: TRA Repository: covid19-2.ireceptor.org Repertoire ID: 5edd817b7dd6ee5ee410772b

processing file: _main.Rmd
Quitting from lines 233-260 (_main.Rmd)
Error in `ggplot2::geom_histogram()`:
! Problem while computing stat.
ℹ Error occurred in the 1st layer.
Caused by error in `seq_len()`:
! argument must be coercible to non-negative integer
Backtrace:
1. base::local(...)
8. bookdown (local) ``(...)
9. bookdown:::render_cur_session(...)
10. rmarkdown::render(main, output_format, ..., clean = clean, envir = envir)
11. knitr::knit(knit_input, knit_output, envir = envir, quiet = quiet)
...
46. ggplot2 (local) compute_statistic(..., self = self)
47. self$stat$compute_layer(data, self$computed_stat_params, layout)
48. ggplot2 (local) compute_layer(..., self = self)
49. ggplot2:::dapply(...)
50. ggplot2:::split_with_index(seq_len(nrow(df)), ids)
Execution halted

bcorrie commented 2 months ago

So I had a look at this data, and sure enough, the sequences aren't there. So this should be reported as a NOT CREDIBLE Repertoire, but it shouldn't crash. 8-)

I need to look into why the sequences aren't there, as there are sequence strings for the junction and there are VDJ annotations. There are not a lot of fields populated so it is likely the annotation tool that was used didn't provide many fields???

bcorrie commented 2 months ago

Confirmed that the authors of the study provided us with the data directly (we did not run the annotation pipeline), and the sequence was not included in the annotated data.

ssnn-airr commented 2 months ago

We should check if the new airr R package validation function handles this. With version1.5.0 , airr::validate_rearrangement(repertoire) returns TRUE for this repertoire, when it should be FALSE.

scharch commented 2 months ago

Looks like it only checks that the field is present, not that it is filled. Which makes sense, because the schema is written to be forgiving of missing data. We should add this check to RepCred.

ssnn-airr commented 2 months ago

Fixed in 307775210705abad7fbe22a835749347b6374c76. @bcorrie Please test in iReceptor once the container is ready

bcorrie commented 2 months ago

@ssnn-airr this doesn't seem to fix it...

I am not using your docker hub directly, as I need to add some things to the container that are iReceptor specific. I have a clone of your github here: https://github.com/sfu-ireceptor/rep-cred/tree/ireceptor

Basically what I do is sync my clone master to your master, merge my master into my ireceptor branch, and then build the docker container locally. I then push to the iReceptor docker hub. I then need to build a singularity container from that.

This has worked in the past, but when I built the container it said it used everything from cache. I would have expected it to at least rebuild the repcred layer.

$ sudo docker build -t ireceptor/ir_repcred:ireceptor ./docker
[+] Building 8.4s (19/19) FINISHED                               docker:default
 => [internal] load build definition from Dockerfile                       0.6s
 => => transferring dockerfile: 2.30kB                                     0.0s
 => [internal] load metadata for docker.io/rocker/shiny:4.3.3              2.3s
 => [auth] rocker/shiny:pull token for registry-1.docker.io                0.0s
 => [internal] load .dockerignore                                          0.7s
 => => transferring context: 2B                                            0.0s
 => [ 1/14] FROM docker.io/rocker/shiny:4.3.3@sha256:6ca7c744cdeb1e92bd02  0.0s
 => CACHED [ 2/14] RUN apt-get update                                      0.0s
 => CACHED [ 3/14] RUN apt-get install -y zip                              0.0s
 => CACHED [ 4/14] RUN apt-get install -y python3-pip python3-venv         0.0s
 => CACHED [ 5/14] RUN python3 -m venv /ireceptor                          0.0s
 => CACHED [ 6/14] RUN pip3 install wheel                                  0.0s
 => CACHED [ 7/14] RUN pip3 install airr                                   0.0s
 => CACHED [ 8/14] RUN pip3 install anndata                                0.0s
 => CACHED [ 9/14] RUN export DEBIAN_FRONTEND=noninteractive && apt-get u  0.0s
 => CACHED [10/14] RUN Rscript -e "install.packages(c('devtools',          0.0s
 => CACHED [11/14] RUN Rscript -e "BiocManager::install(c('Biostrings', '  0.0s
 => CACHED [12/14] RUN Rscript -e "devtools::install_github('airr-communi  0.0s
 => CACHED [13/14] RUN Rscript -e "repcred<-system.file('repcred.R', pack  0.0s
 => CACHED [14/14] WORKDIR /home                                           0.0s
 => exporting to image                                                     0.6s
 => => exporting layers                                                    0.0s
 => => writing image sha256:b77478f99567c9d3cec30dfd10721e516bd34ebebf842  0.3s
 => => naming to docker.io/ireceptor/ir_repcred:ireceptor

I would have expected it to at least have rebuilt the repcred library but it used it from cache as well:

 => CACHED [12/14] RUN Rscript -e "devtools::install_github('airr-communi  0.0s

So I don't think it detected that the github had changed - not sure how that all works in R...

When I pushed to docker hub it said all the layers already existed.

I changed it to install from sfu-ireceptor/rep-cred:ireceptor (which it probably should be anyway) and it updated and pushed a new image. You might want to check and confirm that the docker image has the new code for the AIRR docker hub image.

bcorrie commented 2 months ago

I am building my new singularity image now, will test it shortly.

bcorrie commented 2 months ago

My new image works with these Repertoires. I see that essentially no analysis is done because there is no sequence. I can understand that saying such a Repertoire is not very credible, but the data does contain Junction AA sequence and VDJ calls. I know there are other analysis steps in repcred that look at those fields. Do they also require sequence?

bcorrie commented 2 months ago

Fundamentally I think this issue can be closed, the only question would be whether there is some sort of credibility checking could go on without the sequence?

bcorrie commented 2 months ago

Hmmm, hold on a second. All of my repcred runs are saying there is no sequence_alignment now: "Required sequence_alignment is empty. Exiting analysis."

I need to check on some that we know worked before.

ssnn-airr commented 2 months ago

I will change this

bcorrie commented 2 months ago

Hmm, this doesn't happen in all repertoires.

repertoire_id: 64189b307c7dc3feb341f978 - Has sequence field, but get "Required sequence_alignment is empty. Exiting analysis."
repertoire_id: 5f482d99c076e59ba67f8172 - This one works fine.

So not all jobs are failing with that error, but I thought most tests could run without sequence_alignment, it was a missing sequence that was critical???

ssnn-airr commented 2 months ago

I changed the code to only skip the analysis where sequence_alignment. This should work now.

bcorrie commented 2 months ago

Yes, this appears fixed. For 64189b307c7dc3feb341f978 I get:

Section 2 - In the PDF the list of missing fields is all on one line and not readable
Section 3 - Reports no non-nucleotides - so appears to work.
Section 4 -- 4.1 productive report seems good -- 4.2 reports no sequence_alignment and doesn't produce anything -- No more section 4 subsections
Section 5 - annotation call stats work
Section 6 - reports no sequence_alignment and doesn't produce anything
Section 7 - Chimerism report works
Section 8 - there are no references, not sure if there should be

For 5f482d99c076e59ba67f8172 I get a much more complete report with no sections reporting a problem with sequence_alignment

So assuming the above is to be expected, this issue looks resolved.

airr-community / rep-cred

repcred crashes with non-negative sequence length error. #32