Closed bcorrie closed 7 months ago
So I had a look at this data, and sure enough, the sequences aren't there. So this should be reported as a NOT CREDIBLE Repertoire, but it shouldn't crash. 8-)
I need to look into why the sequences aren't there, as there are sequence strings for the junction and there are VDJ annotations. There are not a lot of fields populated so it is likely the annotation tool that was used didn't provide many fields???
Confirmed that the authors of the study provided us with the data directly (we did not run the annotation pipeline), and the sequence was not included in the annotated data.
We should check if the new airr
R package validation function handles this. With version1.5.0
, airr::validate_rearrangement(repertoire)
returns TRUE
for this repertoire, when it should be FALSE
.
Looks like it only checks that the field is present, not that it is filled. Which makes sense, because the schema is written to be forgiving of missing data. We should add this check to RepCred.
Fixed in 307775210705abad7fbe22a835749347b6374c76. @bcorrie Please test in iReceptor once the container is ready
@ssnn-airr this doesn't seem to fix it...
I am not using your docker hub directly, as I need to add some things to the container that are iReceptor specific. I have a clone of your github here: https://github.com/sfu-ireceptor/rep-cred/tree/ireceptor
Basically what I do is sync my clone master
to your master
, merge my master
into my ireceptor
branch, and then build the docker container locally. I then push to the iReceptor docker hub. I then need to build a singularity container from that.
This has worked in the past, but when I built the container it said it used everything from cache. I would have expected it to at least rebuild the repcred layer.
$ sudo docker build -t ireceptor/ir_repcred:ireceptor ./docker
[+] Building 8.4s (19/19) FINISHED docker:default
=> [internal] load build definition from Dockerfile 0.6s
=> => transferring dockerfile: 2.30kB 0.0s
=> [internal] load metadata for docker.io/rocker/shiny:4.3.3 2.3s
=> [auth] rocker/shiny:pull token for registry-1.docker.io 0.0s
=> [internal] load .dockerignore 0.7s
=> => transferring context: 2B 0.0s
=> [ 1/14] FROM docker.io/rocker/shiny:4.3.3@sha256:6ca7c744cdeb1e92bd02 0.0s
=> CACHED [ 2/14] RUN apt-get update 0.0s
=> CACHED [ 3/14] RUN apt-get install -y zip 0.0s
=> CACHED [ 4/14] RUN apt-get install -y python3-pip python3-venv 0.0s
=> CACHED [ 5/14] RUN python3 -m venv /ireceptor 0.0s
=> CACHED [ 6/14] RUN pip3 install wheel 0.0s
=> CACHED [ 7/14] RUN pip3 install airr 0.0s
=> CACHED [ 8/14] RUN pip3 install anndata 0.0s
=> CACHED [ 9/14] RUN export DEBIAN_FRONTEND=noninteractive && apt-get u 0.0s
=> CACHED [10/14] RUN Rscript -e "install.packages(c('devtools', 0.0s
=> CACHED [11/14] RUN Rscript -e "BiocManager::install(c('Biostrings', ' 0.0s
=> CACHED [12/14] RUN Rscript -e "devtools::install_github('airr-communi 0.0s
=> CACHED [13/14] RUN Rscript -e "repcred<-system.file('repcred.R', pack 0.0s
=> CACHED [14/14] WORKDIR /home 0.0s
=> exporting to image 0.6s
=> => exporting layers 0.0s
=> => writing image sha256:b77478f99567c9d3cec30dfd10721e516bd34ebebf842 0.3s
=> => naming to docker.io/ireceptor/ir_repcred:ireceptor
I would have expected it to at least have rebuilt the repcred library but it used it from cache as well:
=> CACHED [12/14] RUN Rscript -e "devtools::install_github('airr-communi 0.0s
So I don't think it detected that the github had changed - not sure how that all works in R...
When I pushed to docker hub it said all the layers already existed.
I changed it to install from sfu-ireceptor/rep-cred:ireceptor (which it probably should be anyway) and it updated and pushed a new image. You might want to check and confirm that the docker image has the new code for the AIRR docker hub image.
I am building my new singularity image now, will test it shortly.
My new image works with these Repertoires. I see that essentially no analysis is done because there is no sequence. I can understand that saying such a Repertoire is not very credible, but the data does contain Junction AA sequence and VDJ calls. I know there are other analysis steps in repcred that look at those fields. Do they also require sequence?
Fundamentally I think this issue can be closed, the only question would be whether there is some sort of credibility checking could go on without the sequence?
Hmmm, hold on a second. All of my repcred runs are saying there is no sequence_alignment now: "Required sequence_alignment is empty. Exiting analysis."
I need to check on some that we know worked before.
I will change this
Hmm, this doesn't happen in all repertoires.
repertoire_id
: 64189b307c7dc3feb341f978 - Has sequence
field, but get "Required sequence_alignment is empty. Exiting analysis."repertoire_id
: 5f482d99c076e59ba67f8172 - This one works fine. So not all jobs are failing with that error, but I thought most tests could run without sequence_alignment
, it was a missing sequence
that was critical???
I changed the code to only skip the analysis where sequence_alignment
. This should work now.
Yes, this appears fixed. For 64189b307c7dc3feb341f978
I get:
sequence_alignment
and doesn't produce anything
-- No more section 4 subsectionssequence_alignment
and doesn't produce anythingFor 5f482d99c076e59ba67f8172
I get a much more complete report with no sections reporting a problem with sequence_alignment
So assuming the above is to be expected, this issue looks resolved.
Non negative sequence length (???) on this repertoire:
Study ID: PRJNA633317 Study group: Control (Healthy) Subject ID: W Sample ID: W_2018 PCR target: TRA Repository: covid19-2.ireceptor.org Repertoire ID: 5edd817b7dd6ee5ee410772b