Closed elzinga-ibm-research closed 1 year ago
I set out to get a cpu-only installation working. I know it simply "should work" from the existing conda.yml . But I tried on a variety of environments (Ubuntu 20.04 Intel, and RHEL8 Intel), and a few colleagues also tried (Anthony Annunziata, Phil Downey), and we could not get it to work. (The failure mode is that it installs a CUDA-requiring version of PyTorch. This causes the very first import in README.md to error out.)
A tiny tweak to conda.yml DOES get it to work (removing
cpuonly
to its own line).Another tiny tweak to requirements.txt gets "Generative models on mars!" working, from notebooks/diffusion-demo.ipynb (restricting transformers slightly more than it already was: <=4.24.0 ; formerly <4.26.0).
I also recommend (for now) slightly upper-bounding scipy. Currently
pip check
reports:"deepchem 2.7.1 has requirement scipy<1.9, but you have scipy 1.10.1"
I know this seems small, but I tried many many things before I hit on this recipe to get cpuonly working again. I hope the team likes it.
Hi Dean thanks, it's a bit weird you had problems on Ubuntu 20.04 and RHEL 8 as our CI tests the first configuration (tests are passing with no issues in the last days) and we have recent RHEL 8 builds (latest last week). Anyways, in case the unit tests are passing nicely I think we can merge the PR.
Regarding the transformers upper bound I loop in @christofid for feedback. I do not see a major drawback, but I'm surprised this was a problem as all the unit tests were passing.
Hi Dean, thanks! I am not sure why the diffusion-demo has issues with the current transformers requirement. Diffusers require transformers>=4.21.0 so in theory it should be fine. Could you post or describe the issue that you face with diffusers? It could be useful for the future. Other than that, it looks good to me!
Hi Dimitrios, sure, here is the issue w diffusion-demo alleviated by requiring eg 'transformers<4.25.1'. It is reproducible on both Ubuntu 20.04 and RHEL8 AWS instances, which I am using to study gt4sd and its models.
In both cases (error and fix), this is the diffusers version installed.
diffusers 0.6.0
At this cell:
prompt = "Generative models on mars!"
algorithm = ApplicationsRegistry.get_application_instance( target=None, algorithm_type='generation', domain='vision', algorithm_name='DiffusersGenerationAlgorithm', algorithm_application='StableDiffusionGenerator', # authenticate on huggingface prompt=prompt )
... this error occurs (including the whole stack trace from Jupyter):
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ in <module>:3 │
│ │
│ 1 prompt = "Generative models on mars!" │
│ 2 │
│ ❱ 3 algorithm = ApplicationsRegistry.get_application_instance( │
│ 4 │ target=None, │
│ 5 │ algorithm_type='generation', │
│ 6 │ domain='vision', │
│ │
│ /home/ec2-user/gt4sd-core/src/gt4sd/algorithms/registry.py:359 in get_application_instance │
│ │
│ 356 │ │ } │
│ 357 │ │ parameters.update(kwargs) │
│ 358 │ │ │
│ ❱ 359 │ │ return application_tuple.algorithm_class( │
│ 360 │ │ │ configuration=application_tuple.configuration_class(**parameters), │
│ 361 │ │ │ target=target, │
│ 362 │ │ ) │
│ │
│ /home/ec2-user/gt4sd-core/src/gt4sd/algorithms/generation/diffusion/core.py:69 in __init__ │
│ │
│ 66 │ │ configuration = self.validate_configuration(configuration) │
│ 67 │ │ # TODO there might also be a validation/check on the target input │
│ 68 │ │ │
│ ❱ 69 │ │ super().__init__( │
│ 70 │ │ │ configuration=configuration, │
│ 71 │ │ │ target=None, # type:ignore │
│ 72 │ │ ) │
│ │
│ /home/ec2-user/gt4sd-core/src/gt4sd/algorithms/core.py:109 in __init__ │
│ │
│ 106 │ │ logger.info( │
│ 107 │ │ │ f"runnning {self.__class__.__name__} with configuration={configuration}" │
│ 108 │ │ ) │
│ ❱ 109 │ │ generator = self.get_generator(configuration, target) │
│ 110 │ │ setattr( │
│ 111 │ │ │ self, │
│ 112 │ │ │ "generate", │
│ │
│ /home/ec2-user/gt4sd-core/src/gt4sd/algorithms/generation/diffusion/core.py:90 in get_generator │
│ │
│ 87 │ │ │
│ 88 │ │ logger.info("ensure artifacts for the application are present.") │
│ 89 │ │ self.local_artifacts = configuration.ensure_artifacts() │
│ ❱ 90 │ │ implementation: Generator = configuration.get_conditional_generator( # type: ig │
│ 91 │ │ │ self.local_artifacts │
│ 92 │ │ ) │
│ 93 │ │ return implementation.sample │
│ │
│ /home/ec2-user/gt4sd-core/src/gt4sd/algorithms/generation/diffusion/core.py:136 in │
│ get_conditional_generator │
│ │
│ 133 │ ) │
│ 134 │ │
│ 135 │ def get_conditional_generator(self, resources_path: str, **kwargs) -> Generator: │
│ ❱ 136 │ │ return Generator( │
│ 137 │ │ │ resources_path=resources_path, │
│ 138 │ │ │ model_type=self.model_type, │
│ 139 │ │ │ model_name=self.algorithm_version, │
│ │
│ /home/ec2-user/gt4sd-core/src/gt4sd/algorithms/generation/diffusion/implementation.py:120 in │
│ __init__ │
│ │
│ 117 │ │ self.scheduler_type = scheduler_type │
│ 118 │ │ self.prompt = prompt │
│ 119 │ │ self.auth_token = auth_token │
│ ❱ 120 │ │ self.load_model() │
│ 121 │ │
│ 122 │ def load_model(self) -> None: │
│ 123 │ │ """Load a pretrained diffusion generative model.""" │
│ │
│ /home/ec2-user/gt4sd-core/src/gt4sd/algorithms/generation/diffusion/implementation.py:139 in │
│ load_model │
│ │
│ 136 │ │ │ model_name_or_path = self.model_name │
│ 137 │ │ │
│ 138 │ │ if self.model_type == "stable_diffusion": │
│ ❱ 139 │ │ │ self.model = model_class.from_pretrained( │
│ 140 │ │ │ │ model_name_or_path, │
│ 141 │ │ │ │ use_auth_token=self.auth_token, │
│ 142 │ │ │ ) │
│ │
│ /home/ec2-user/mambaforge/envs/gt4sd/lib/python3.8/site-packages/diffusers/pipeline_utils.py:516 │
│ in from_pretrained │
│ │
│ 513 │ │ │ │ │ │ # call class_obj for nice error message of missing requirements │
│ 514 │ │ │ │ │ │ class_obj() │
│ 515 │ │ │ │ │ │
│ ❱ 516 │ │ │ │ │ raise ValueError( │
│ 517 │ │ │ │ │ │ f"The component {class_obj} of {pipeline_class} cannot be loaded │
│ 518 │ │ │ │ │ │ f" any of the loading methods defined in {ALL_IMPORTABLE_CLASSES │
│ 519 │ │ │ │ │ ) │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: The component <class 'transformers.models.clip.image_processing_clip.CLIPImageProcessor'> of <class
'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> cannot be loaded as it
does not seem to have any of the loading methods defined in {'ModelMixin': ['save_pretrained', 'from_pretrained'],
'SchedulerMixin': ['save_config', 'from_config'], 'DiffusionPipeline': ['save_pretrained', 'from_pretrained'],
'OnnxRuntimeModel': ['save_pretrained', 'from_pretrained'], 'PreTrainedTokenizer': ['save_pretrained',
'from_pretrained'], 'PreTrainedTokenizerFast': ['save_pretrained', 'from_pretrained'], 'PreTrainedModel':
['save_pretrained', 'from_pretrained'], 'FeatureExtractionMixin': ['save_pretrained', 'from_pretrained']}.
Hi Matteo! Was the RHEL system you mention without GPU?
That's the configuration I'm debugging here. No GPU, no CUDA.
The failure mode is that the existing conda.yml ends up installing a CUDA-requiring version of PyTorch. I believe this occurs even before conda reads the pip requirements, including the cpu_requirements.txt file.
Hi Dean thanks, it's a bit weird you had problems on Ubuntu 20.04 and RHEL 8 as our CI tests the first configuration (tests are passing with no issues in the last days) and we have recent RHEL 8 builds (latest last week). Anyways, in case the unit tests are passing nicely I think we can merge the PR.
I agree it's a bit weird. I'd love to figure out why it seems to work in tests, but not on any CPU-only system I can get my hands on (without this little fix)—and I've tried a lot!
I set out to get a cpu-only installation working. I know it simply "should work" from the existing conda.yml . But I tried on a variety of environments (Ubuntu 20.04 Intel, and RHEL8 Intel), and a few colleagues also tried (Anthony Annunziata, Phil Downey), and we could not get it to work. (The failure mode is that it installs a CUDA-requiring version of PyTorch. This causes the very first import in README.md to error out.)
A tiny tweak to conda.yml DOES get it to work (removing
cpuonly
to its own line).Another tiny tweak to requirements.txt gets "Generative models on mars!" working, from notebooks/diffusion-demo.ipynb (restricting transformers slightly more than it already was: <=4.24.0 ; formerly <4.26.0).
I also recommend (for now) slightly upper-bounding scipy. Currently
pip check
reports:"deepchem 2.7.1 has requirement scipy<1.9, but you have scipy 1.10.1"
I know this seems small, but I tried many many things before I hit on this recipe to get cpuonly working again. I hope the team likes it.