Open FrsECM opened 5 months ago
For people who may have the problem, i got a fix :
import re
import os
def mount_options()->MountOptions:
max_size = None
free_space_required = None
cache_param = os.getenv('DATASET_MOUNT_CACHE_SIZE',None)
if cache_param:
CACHE_SIZE_PATTERN = r'^(?P<sign>-?)(?P<val>\d+).*(?P<size>[A-Z]{2})$'
match = re.match(CACHE_SIZE_PATTERN,cache_param)
if match:
size = match.group('size')
if size == 'GB':
coeff = 1024**3
elif size =='MB':
coeff = 1024**2
else:
raise NotImplementedError(f'Not implemented for size {size}')
value = int(match.group('val'))*coeff
if match.group('sign')=='-':
# We are in mode "free_space_required"
free_space_required = value
print(f'MountOption : {value} Max Free Space')
else:
# We are in mode "max_size"
max_size = value
print(f'MountOption : {value} Max Size')
return MountOptions(max_size=max_size,free_space_required=free_space_required)
###### You can now consume your mltable
storage_paths = [
{'folder':'azureml://subscriptions/$sub/resourcegroups/$rg/workspaces/$ws/datastores/$ds/paths/'}
]
tbl = mltable.from_paths(storage_paths )
mount_context = tbl._mount(mount_options=mount_options())
mount_context.start()
If i do this way, it works, but it ignores the prune target :
Anyway, it's a bug for me, the behaviour should be consistent with the documentation.
I have the same bug. data caching eats up all memory on 64Gb disk. Cant store training checkpoints.
Tried setting DATASET_MOUNT_BLOCK_BASED_CACHE_ENABLED: true
, but error arises, cant set boolean type.
When I set DATASET_MOUNT_BLOCK_BASED_CACHE_ENABLED: "true"
, nothing happens. Data keeps getting cached
I have the same bug. data caching eats up all memory on 64Gb disk. Cant store training checkpoints. Tried setting DATASET_MOUNT_BLOCK_BASED_CACHE_ENABLED: true, but error arises, cant set boolean type. When I set DATASET_MOUNT_BLOCK_BASED_CACHE_ENABLED: "true" , nothing happens. Data keeps getting cached
Normally you can use the fix i did, just set the DATASET_MOUNT_CACHE_SIZE env variable with a size and normally it should work.
But anyway it should be fixed....
Another concern we have is that we can not set other parameters like theses two :
It would allow us to grab less data than required because when we use a shuffled dataloader there is no interest to cache more block than the average image size.
Would it be possible to opensource mltable ?
Operating System
Linux
Version Information
mltable-1.6.1 azureml-dataprep-rslex~=2.22.2dev0
Steps to reproduce
For example, in AzureMachine Learning :
In order to fix my issue, i need to add extra mount settings : https://learn.microsoft.com/en-us/azure/machine-learning/how-to-read-write-data-v2?view=azureml-api-2&tabs=python#available-mount-settings
I use a wrapper class in order to do this on multiple storage / containers :
I also tried to add the environment variable in the yaml job :
But none of theses solutions are working well.
Expected behavior
I expect that the disk cache is pruned when it is reaching the -40GB limit on the compute machine.
Actual behavior
Currently, the cache continues to grow :
Until fail :
Even if i set environment variables in yaml :
or in code :
And i can confirm that the environment variable are used in the job :
But it seems mltables are ignoring them.
Addition information
No response