Closed svonworl closed 2 weeks ago
Attention: Patch coverage is 66.97248%
with 36 lines
in your changes missing coverage. Please review.
Project coverage is 74.36%. Comparing base (
74674f8
) to head (ebb3e5a
).
Files | Patch % | Lines |
---|---|---|
...tore/webservice/core/LimitedSourceFileBuilder.java | 10.52% | 34 Missing :warning: |
...re/webservice/languages/LanguagePluginHandler.java | 81.81% | 1 Missing and 1 partial :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
increases the maximum size limit of a Jupyter notebook file to 3MB.
From a different conversation, the limit per file is possibly less important than the limit per version (or even per workflow?). Link follow-up ticket here if spun off
Thus, in a followup PR, I propose to modify SourceFile so that we can differentiate a normal SourceFile from one that didn't meet our requirements.
:+1:
Ok, I overhauled the PR to address most of the feedback. Please refer to the "Update" section at the top of the PR description for more information.
Failed conditions
63.2% Coverage on New Code (required ≥ 80%)
Update
Alrighteee... the original PR has been overhauled. Important changes and observations:
SourceFile
size/binary limiting code is now inLimitedSourceFileBuilder
, which demonstrates the "step builder" pattern: http://rdafbn.blogspot.com/2012/10/step-builder-pattern-definition-and.htmlSourceFile
.SourceFile
that will signal that the file is complete, the content is an error message, etc.: https://ucsc-cgl.atlassian.net/browse/SEAB-6488 https://ucsc-cgl.atlassian.net/browse/SEAB-6489Description Whilst diagnosing https://ucsc-cgl.atlassian.net/browse/SEAB-6423, we discovered that when registering a 2.5MB notebook file, the corresponding SourceFile's content is
null
. This causes the UI "Preview" notebook tab to report that the notebook does not have any content.Investigation revealed that the webservice only enforces the maximum size and non-binary SourceFile requirements on the old entry registration path. On the .dockstore.yml registration path, SourceFiles are still effectively limited, but only because:
To address the above, this PR:
create
helper method.If we consistently use these SourceFile helper functions, we make it easy to tweak the SourceFile creation process (to add a new requirement, for example) and ensure that, should we modify the SourceFile object itself, all fields get copied correctly.
As before, when a file doesn't meet the size or non-binary requirements, we set the corresponding SourceFile's content to an error message. For example, "Dockstore does not store binary files". However, code that subsequently uses the SourceFile has no way to differentiate the error message from valid file content. So, after this PR, when a "too large" notebook is registered, the notebook's "Preview" tab will try to parse the error message as a Jupyter notebook, and fail in a different way.
Thus, in a followup PR, I propose to modify SourceFile so that we can differentiate a normal SourceFile from one that didn't meet our requirements. Perhaps we store the error message in a different field, or add a flag or enum that differentiates "COMPLETE" and "ERROR" content. An enum would allow us to support other states, such as a "SUMMARY" state, which would describe an abbreviated form of the full content (perhaps useful if we diversify into indexing AI models, which we might not want [or be able] to store in complete form).
It can be argued that [some of?] the helpers should be methods of SourceFile, etc. If anyone feels super-strongly, we can tweak. There's a rationale behind the current design, but I've already typed too much in this description, so I won't dump it here.
This PR is a little bit scary from a reliability/correctness point of view. The new code passes our existing tests, and appears to be covered. However, given the variety of states and contexts involved, it's very difficult to confirm that we haven't introduced a bug, without spending tons of time auditing the existing tests and filling any holes. Given the way surrounding code is currently architected, it's hard to test, which makes verifying correctness even more challenging. It'll help to visually inspect very carefully and be on the lookout for sourcefile-related problems after we deploy to qa, but it's still a little risky.
Review Instructions Register the notebook which is linked in the ticket, and confirm that Dockstore successfully read it, by visiting the notebook page and confirming that the Preview is correct.
Issue https://ucsc-cgl.atlassian.net/browse/SEAB-6423
Security and Privacy
No concerns.
e.g. Does this change...
Please make sure that you've checked the following before submitting your pull request. Thanks!
mvn clean install
@RolesAllowed
annotation