apple / coremltools

Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.
https://coremltools.readme.io
BSD 3-Clause "New" or "Revised" License
4.22k stars 609 forks source link

API to locate and clean compiled files model.hwx .mlmodelc #1738

Open ashleysbutcher opened 1 year ago

ashleysbutcher commented 1 year ago

🌱 Describe your Feature Request

It would be great if using the coremltools these compiled files could be located and cleaned.

How can this feature be used?

When experimenting and evaluating many different model architectures when performing neural architecture search, the disk space gets quickly used up.

Describe alternatives you've considered

Tried to located these temporary compiled files, however, I believe it would be more suitable if would be possible to locate and control these from the coremltools api. I have inspected the files generated using the aneservicecompiler and it generates 100 GB quite quickly.

Additional context

May be there is some other command line method I am not aware of, but it would be nice as a feature of the api.

TobyRoseman commented 1 year ago

With the release of coremltools 6.2 we included a better way of cleaning up .mlmodelc files. Please try using coremltools 6.2 and see if that solve the .mlmodelc issue.

Are the model.hwx files stored in the same directories as the .mlmodelc files or somewhere else?

ashleysbutcher commented 1 year ago

I have upgrade coremltools. I am not to sure where the files are stored, I have been trying to figure this out so I can delete them. I have tried different means to monitor files written to disk, but struggling to find their location. If I load 100 different models and infer them and then use activity monitor ANECompilerService writes about 5GB to disk, my used space goes from 165 to 170GB. It's beyond my knowledge but potentially this the solution may be ANECompilerService needs a method to tell the location of the files to coreml / coremltools. I would like to clean up these tempory files created by the ANECompilerService in my python loop that tests the latency and accuracy of my model so that I can samples 10000s of models without running out of disk space.

junpeiz commented 1 year ago

@ashleysbutcher Thank you for describing your use case. By setting package_dir in ct.convert you will have full control about where the tmp file is stored. For example:

tmp_files_dir = tempfile.TemporaryDirectory()

// Convert 10000s of models
for i in range(10000):
    ct.convert(..., package_dir=os.path.join(self.tmp_files_dir.name, str(uuid.uuid4()) + ".mlpackage"))

// Clean up tmp_files_dir manually
shutil.rmtree(tmp_files_dir)
ashleysbutcher commented 1 year ago

The problem I have at the moment is that I generated lots of neural networks of type .mlmodel files and the files are created by the ANECompilerService when I load and run them not when they are converted. I will try this code anyway and see if create mlpackages as opposed to .mlmodels helps.

junpeiz commented 1 year ago

The ANECompilerService you mentioned is not in this coremltools repo, right? Just want to confirm if your use case is related to this repo. May I know how you load and run them, by Swift or by using coremltools?

ashleysbutcher commented 1 year ago

I beleive the ANECompilerService is part of MacOS, but I am not an expert. I just noticed it gets called and created a 100GB of files when I run a 1000 different models with the coremltools. I run the models using coremltools and not swift. It would be interesting to see if running these models in swift causes the same problem.