Bioconductor / Contributions

Contribute Packages to Bioconductor
134 stars 33 forks source link

(inactive) cellassign #1178

Closed kieranrcampbell closed 4 years ago

kieranrcampbell commented 5 years ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR, WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

LTLA commented 4 years ago

Oh yeah, this SSL issue was another reason why I gave up on pip, I could never get that to work on Windows. Perhaps the following flags in the pip invocation would help:

--trusted-host=pypi.python.org --trusted-host=pypi.org --trusted-host=files.pythonhosted.org

But I don't know enough to say that this is a good thing to do in general. Sounds kinda risky.

LiNk-NY commented 4 years ago

Thanks for keeping an eye out on this. CC'ing @kieranrcampbell

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

2d10235 CHange install based on OS

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR, WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

LTLA commented 4 years ago

Well, at least the CondaVerificationErrors are the same as what Biocsklearn is eating:

http://bioconductor.org/checkResults/devel/bioc-LATEST/BiocSklearn/tokay2-buildsrc.html

CondaVerificationError: The package for scikit-learn located at C:\Users\BIOCBU~1\AppData\Local\me\basilisk\099~1.59\Cache\anaconda\pkgs\scikit-learn-0.22.2.post1-py37h7208079_0
appears to be corrupted. The path 'Lib/site-packages/sklearn/datasets/tests/data/openml/292/api-v1-json-data-list-data_name-australian-limit-2-data_version-1-status-deactivated.json.gz'
specified in the package manifest cannot be found.

This has been a persistent error for some time and is pretty cryptic to me; I'm just running reticulate::conda_install, for crying out loud! I would ordinarily think that some corrupted version has been cached from some previous iteration and is being re-used, but that seems unlikely as basilisk destroys its conda-related cache whenever its own version gets bumped. (Similarly, each client package will destroy its previous cached environment when its version is bumped.)

The next possibility is that the Windows builders (SPB and tokay2) are running out of disk space in the cache directory. Seems unlikely, but who knows. Maybe @lshep could take a look?

I would also mention that, thanks to some experiments from @vjcitn, we know that Biocsklearn has no inherent problems with running on Windows (well, none relative to the usual Windows-related shenanigans). So the CondaVerificationError may be specific to the BioC build system.

Edit: Possibly many of these issues may be related to the fact that reticulate does not seem to activate the conda environment before doing stuff with it. This causes Windows to wander off and find any old thing in the PATH, linking to the wrong SSL libraries and causing a variety of errors. I would further speculate that the error also depends on the state of the Windows system libraries, which is why you don't see it in clean VM instances but instead in long-running servers.

Edit 2: 0.99.60 now set CONDA_DLL_SEARCH_MODIFICATION_ENABLE=1 in an attempt to correct the path search, given that we can't really modify the bash session via activation.

Edit 3: Unbelievably, that environment variable seemed to have worked; 0.99.60 builds on all systems and pip install passes its tests, which implies that the SSL problems are fixed.

LTLA commented 4 years ago

Wait. Wait. I realized what the problem was. Consider the full path of the first missing file:

C:\Users\pkgbuild\AppData\Local\me\basilisk\099~1.59\Cache\anaconda\pkgs\tensorflow-base-2.1.0-mkl_py37h230818c_0/Lib/site-packages/tensorflow-2.1.0.data/purelib/tensorflow_core/include/tensorflow_core/core/common_runtime/isolate_placer_inspection_required_ops_pass.h

Verily, it is beyond 260 characters, which is the file path limit on Windows! Ha!

Now, that is much, much harder to solve. The only part of that path that basilisk controls is:

me\basilisk\099~1.59\Cache\anaconda

Not much to cut off there. I could slim it to:

basilisk\099~1.59\0

... which gives us enough breathing space for the longest path (transposer_factory.h).

However, I suspect that this just kicks the can down the road; one can be pretty sure that the environments will bump the path length back up again. The relevant section becomes:

basilisk\099~1.59\cellassign-<cellassign's version here>\<env name here>

I guess I could further change behavior on windows so that we instead get:

cellassign\<cellassign's version here>\<env name here>

... which is probably fine, so long as you keep your environment names short. Like, really short.

A "better" solution would be to put the files somewhere else other than the cache. But there's really nowhere to go from C:\Users\pkgbuild\AppData\Local without hitting system dirs!

Edit: 0.99.61 now has shorter paths for the base directory. I haven't changed the environment path structure yet - though I add the client's name, version and environment, I think we also lose the Python package name (tensorflow-base-2.1.0-mkl_py37h230818c_0) which should be a better-than-even trade in most cases. So let's see how it goes; I have a standby solution in the windows branch of basilisk that mimics a registry of "installed" environments, but I'd rather not use it.

vjcitn commented 4 years ago

This is from https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file?redirectedfrom=MSDN#maximum_path_length

have you

run into this solution?

The Windows API has many functions that also have Unicode versions to permit an extended-length path for a maximum total path length of 32,767 characters. This type of path is composed of components separated by backslashes, each up to the value returned in the lpMaximumComponentLength parameter of the GetVolumeInformation https://docs.microsoft.com/en-us/windows/desktop/api/FileAPI/nf-fileapi-getvolumeinformationa function (this value is commonly 255 characters). To specify an extended-length path, use the "\?\" prefix. For example, "\?\D:*very long path*".

Note

The maximum path of 32,767 characters is approximate, because the "\?\" prefix may be expanded to a longer string by the system at run time, and this expansion applies to the total length.

On Mon, Apr 13, 2020 at 10:50 PM Aaron Lun notifications@github.com wrote:

Wait. Wait. I realized what the problem was. Consider the full path of the first missing file:

C:\Users\pkgbuild\AppData\Local\me\basilisk\099~1.59\Cache\anaconda\pkgs\tensorflow-base-2.1.0-mkl_py37h230818c_0/Lib/site-packages/tensorflow-2.1.0.data/purelib/tensorflow_core/include/tensorflow_core/core/common_runtime/isolate_placer_inspection_required_ops_pass.h

Verily, it is beyond 260 characters, which is the file path limit on Windows! Ha!

Now, that is much, much harder to solve. The only part of that path that basilisk controls is:

me\basilisk\099~1.59\Cache\anaconda

Not much to cut off there. I could slim it to:

basilisk\099~1.59\00

... which gives us enough breathing space for the longest path ( transposer_factory.h).

However, I suspect that this just kicks the can down the road; one can be pretty sure that the environments will bump the path length back up again. The relevant section becomes:

basilisk\099~1.59\cellassign\

A "better" solution would be to put the files somewhere else other than the cache. But there's really nowhere to go from C:\Users\pkgbuild\AppData\Local without hitting system dirs!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Bioconductor/Contributions/issues/1178#issuecomment-613197345, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDI5QQA7ZAGDCCMVWVCZ2DRMPFOFANCNFSM4H6MYMTQ .

-- The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Partners Compliance  HelpLine at http://www.partners.org/complianceline http://www.partners.org/complianceline . If the e-mail was sent to you in error but does not contain patient information, please contact the sender and properly dispose of the e-mail.

vjcitn commented 4 years ago

have registry on the build system edited to allow long paths and give advice to users on how to do it themselves

https://www.howtogeek.com/266621/how-to-make-windows-10-accept-file-paths-over-260-characters/

On Tue, Apr 14, 2020 at 5:09 AM Vincent Carey stvjc@channing.harvard.edu wrote:

This is from https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file?redirectedfrom=MSDN#maximum_path_length -- have you

run into this solution?

The Windows API has many functions that also have Unicode versions to permit an extended-length path for a maximum total path length of 32,767 characters. This type of path is composed of components separated by backslashes, each up to the value returned in the lpMaximumComponentLength parameter of the GetVolumeInformation https://docs.microsoft.com/en-us/windows/desktop/api/FileAPI/nf-fileapi-getvolumeinformationa function (this value is commonly 255 characters). To specify an extended-length path, use the "\?\" prefix. For example, "\?\D:*very long path*".

Note

The maximum path of 32,767 characters is approximate, because the "\?\" prefix may be expanded to a longer string by the system at run time, and this expansion applies to the total length.

On Mon, Apr 13, 2020 at 10:50 PM Aaron Lun notifications@github.com wrote:

Wait. Wait. I realized what the problem was. Consider the full path of the first missing file:

C:\Users\pkgbuild\AppData\Local\me\basilisk\099~1.59\Cache\anaconda\pkgs\tensorflow-base-2.1.0-mkl_py37h230818c_0/Lib/site-packages/tensorflow-2.1.0.data/purelib/tensorflow_core/include/tensorflow_core/core/common_runtime/isolate_placer_inspection_required_ops_pass.h

Verily, it is beyond 260 characters, which is the file path limit on Windows! Ha!

Now, that is much, much harder to solve. The only part of that path that basilisk controls is:

me\basilisk\099~1.59\Cache\anaconda

Not much to cut off there. I could slim it to:

basilisk\099~1.59\00

... which gives us enough breathing space for the longest path ( transposer_factory.h).

However, I suspect that this just kicks the can down the road; one can be pretty sure that the environments will bump the path length back up again. The relevant section becomes:

basilisk\099~1.59\cellassign\

A "better" solution would be to put the files somewhere else other than the cache. But there's really nowhere to go from C:\Users\pkgbuild\AppData\Local without hitting system dirs!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Bioconductor/Contributions/issues/1178#issuecomment-613197345, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDI5QQA7ZAGDCCMVWVCZ2DRMPFOFANCNFSM4H6MYMTQ .

-- The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Partners Compliance  HelpLine at http://www.partners.org/complianceline http://www.partners.org/complianceline . If the e-mail was sent to you in error but does not contain patient information, please contact the sender and properly dispose of the e-mail.

bioc-issue-bot commented 4 years ago

Received a valid push; starting a build. Commits are:

ea5329e Bump version number

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR, WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

LTLA commented 4 years ago

@vjcitn

Re. unicode paths, I didn't know that, but I see that the conda folks run into this same problem and didn't see this recommendation anywhere, so I assume that there is a reason why this doesn't work in general. Maybe it relies on a too-recent version of the Windows API? I dunno.

Re. registry, sounds a bit dangerous to me. Definitely a last-ditch option.

vince commented 4 years ago

@LTLA wrong Vince, friend.

LTLA commented 4 years ago

Sorry, got mixed up between @vjcitn's GitHub and slack handles!

LiNk-NY commented 4 years ago

BTW @LTLA, have you tried using appveyor for Windows CI / tests?

LTLA commented 4 years ago

Nope, I just use BioC's servers for all of my CI needs.

LTLA commented 4 years ago

Just to give an update: the path length modification was added to 0.99.61 but has not propagated, being held up by unpredictable failures on tokay2. I hope it will get through to give enough time for cellassign testing, but if not, you may consider not supporting Windows for now.

lshep commented 4 years ago

What is the status of this package?

kieranrcampbell commented 4 years ago

I'm hoping now Basilisk is established on bioconductor we can have another go and decide on whether to support windows

LiNk-NY commented 4 years ago

Hi Kieran, @kieranrcampbell

I think you can have another go. Aaron has made progress on getting it to build successfully on all platforms: https://community-bioc.slack.com/archives/CEQ04GKEC/p1589834161387300

http://bioconductor.org/checkResults/devel/bioc-LATEST/basilisk/

bioc-issue-bot commented 4 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS, skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

LTLA commented 4 years ago

Oh goody. At last, an error that isn't basilisk's fault.

Needless to say, I think you can get rid of the .onLoad statement now, given that you are now in control of your own destiny with respect to tensorflow's presence on the system. (Putting the check in the .onLoad doesn't seem like a great idea in the first place, but whatever.)

LiNk-NY commented 4 years ago

@kieranrcampbell Any updates?

kieranrcampbell commented 4 years ago

Hi @LiNk-NY

I'll take a look at this shortly. Plan to have wrapped up for next bioc release now basilisk is in

Thanks

Kieran

LiNk-NY commented 4 years ago

Hi Keiran, @kieranrcampbell

Any updates on the package? Otherwise I'm forced to close the issue until it's ready. Thanks! -Marcel

bioc-issue-bot commented 4 years ago

This issue is being closed because there has been no progress for an extended period of time. You may reopen the issue when you have the time to actively participate in the review / submission process. Please also keep in mind that a package accepted to Bioconductor requires a commitment on your part to ongoing maintenance.

Thank you for interest in Bioconductor.