Open robin1900 opened 3 years ago
Do you have tried to copy this file from lustre to local file system? How many time does it take?
Do you have tried to copy this file from lustre to local file system? How many time does it take?
yes . only take 2 seconds
You can check throughput with https://docs.alluxio.io/os/user/stable/en/operation/Admin-CLI.html#metrics
. Are they using the same machine to copy file, and did you clear page cache. Actually , If it took 2 seconds to copy 1.8 GiB file, it means nearly 1 GiB/second. Can you confirm this?
You can check throughput with
https://docs.alluxio.io/os/user/stable/en/operation/Admin-CLI.html#metrics
. Are they using the same machine to copy file, and did you clear page cache. Actually , If it took 2 seconds to copy 1.8 GiB file, it means nearly 1 GiB/second. Can you confirm this?
yes, I cleaned zhe page cache. it takes 4 seconds from lustre. the lustre is small cluster with 3 nodes , and base on ssd. so it is fast
You can check throughput with
https://docs.alluxio.io/os/user/stable/en/operation/Admin-CLI.html#metrics
. Are they using the same machine to copy file, and did you clear page cache. Actually , If it took 2 seconds to copy 1.8 GiB file, it means nearly 1 GiB/second. Can you confirm this?
I got the metrics for copy the file(before copy I cleared metrics already):
Cluster.BytesReadDirect (Type: COUNTER, Value: 0B) Cluster.BytesReadDirectThroughput (Type: GAUGE, Value: 0B/MIN) Cluster.BytesReadDomain (Type: COUNTER, Value: 0B) Cluster.BytesReadDomainThroughput (Type: GAUGE, Value: 0B/MIN) Cluster.BytesReadLocal (Type: COUNTER, Value: 0B) Cluster.BytesReadLocalThroughput (Type: GAUGE, Value: 0B/MIN) Cluster.BytesReadRemote (Type: COUNTER, Value: 84.17GB) Cluster.BytesReadRemoteThroughput (Type: GAUGE, Value: 42.09GB/MIN) Cluster.BytesReadUfsAll (Type: COUNTER, Value: 0B) Cluster.BytesReadUfsThroughput (Type: GAUGE, Value: 0B/MIN) Cluster.BytesWrittenDomain (Type: COUNTER, Value: 0B) Cluster.BytesWrittenDomainThroughput (Type: GAUGE, Value: 0B/MIN) Cluster.BytesWrittenLocal (Type: COUNTER, Value: 0B) Cluster.BytesWrittenLocalThroughput (Type: GAUGE, Value: 0B/MIN) Cluster.BytesWrittenRemote (Type: COUNTER, Value: 0B) Cluster.BytesWrittenRemoteThroughput (Type: GAUGE, Value: 0B/MIN) Cluster.BytesWrittenUfsAll (Type: COUNTER, Value: 0B) Cluster.BytesWrittenUfsThroughput (Type: GAUGE, Value: 0B/MIN) Cluster.CapacityFree (Type: GAUGE, Value: 571,518,496,412) Cluster.CapacityFreeTierHDD (Type: GAUGE, Value: 0) Cluster.CapacityFreeTierMEM (Type: GAUGE, Value: 91,555,901,084) Cluster.CapacityFreeTierSSD (Type: GAUGE, Value: 479,962,595,328) Cluster.CapacityTotal (Type: GAUGE, Value: 579,820,584,960) Cluster.CapacityTotalTierHDD (Type: GAUGE, Value: 0) Cluster.CapacityTotalTierMEM (Type: GAUGE, Value: 96,636,764,160) Cluster.CapacityTotalTierSSD (Type: GAUGE, Value: 483,183,820,800) Cluster.CapacityUsed (Type: GAUGE, Value: 8,302,088,548) Cluster.CapacityUsedTierHDD (Type: GAUGE, Value: 0) Cluster.CapacityUsedTierMEM (Type: GAUGE, Value: 5,080,863,076) Cluster.CapacityUsedTierSSD (Type: GAUGE, Value: 3,221,225,472) Cluster.RootUfsCapacityFree (Type: GAUGE, Value: 193,345,646,592) Cluster.RootUfsCapacityTotal (Type: GAUGE, Value: 211,243,999,232) Cluster.RootUfsCapacityUsed (Type: GAUGE, Value: 17,898,352,640) Cluster.Workers (Type: GAUGE, Value: 3) Master.AbsentCacheHits (Type: COUNTER, Value: 0) Master.AbsentCacheInvalidations (Type: COUNTER, Value: 0) Master.AbsentCacheMisses (Type: COUNTER, Value: 0) Master.AbsentCacheSize (Type: GAUGE, Value: 0) Master.CompleteFileOps (Type: COUNTER, Value: 0) Master.ConnectFromMaster.UFS:%2Fjournal%2FMetricsMaster.UFS_TYPE:local (Type: TIMER, Value: 0) Master.Create.UFS:%2Fjournal%2FMetricsMaster.UFS_TYPE:local (Type: TIMER, Value: 0) Master.CreateDirectoryOps (Type: COUNTER, Value: 0) Master.CreateFileOps (Type: COUNTER, Value: 0) Master.DeletePathOps (Type: COUNTER, Value: 0) Master.DirectoriesCreated (Type: COUNTER, Value: 0) Master.EdgeCacheEvictions (Type: COUNTER, Value: 0) Master.EdgeCacheHits (Type: COUNTER, Value: 2,136,970) Master.EdgeCacheLoadTimes (Type: COUNTER, Value: 0) Master.EdgeCacheMisses (Type: COUNTER, Value: 0) Master.EdgeCacheSize (Type: GAUGE, Value: 123,438) Master.EdgeLockPoolSize (Type: GAUGE, Value: 123,439) Master.FileBlockInfosGot (Type: COUNTER, Value: 0) Master.FileInfosGot (Type: COUNTER, Value: 15) Master.FilesCompleted (Type: COUNTER, Value: 0) Master.FilesCreated (Type: COUNTER, Value: 0) Master.FilesFreed (Type: COUNTER, Value: 0) Master.FilesPersisted (Type: COUNTER, Value: 0) Master.FilesPinned (Type: GAUGE, Value: 0) Master.FreeFileOps (Type: COUNTER, Value: 0) Master.GetAcl.UFS:%2Fjournal%2FMetricsMaster.UFS_TYPE:local (Type: TIMER, Value: 0) Master.GetAcl.User:root.UFS:%2Fjournal%2FMetricsMaster.UFS_TYPE:local (Type: TIMER, Value: 0) Master.GetBlockMasterInfo.User:root (Type: TIMER, Value: 88) Master.GetFileBlockInfoOps (Type: COUNTER, Value: 0) Master.GetFileInfoOps (Type: COUNTER, Value: 4) Master.GetFileLocations.User:root.UFS:%2Fjournal%2FMetricsMaster.UFS_TYPE:local (Type: TIMER, Value: 0) Master.GetNewBlockOps (Type: COUNTER, Value: 0) Master.GetSpace.UFS:%2Fjournal%2FMetricsMaster.UFS_TYPE:local (Type: TIMER, Value: 0) Master.GetSpace.User:root.UFS:%2Fjournal%2FMetricsMaster.UFS_TYPE:local (Type: TIMER, Value: 9) Master.GetStatus.User:root (Type: TIMER, Value: 2) Master.GetStatus.User:root.UFS:%2Fjournal%2FMetricsMaster.UFS_TYPE:local (Type: TIMER, Value: 0) Master.GetWorkerInfoList.User:root (Type: TIMER, Value: 2) Master.InodeCacheEvictions (Type: COUNTER, Value: 0) Master.InodeCacheHits (Type: COUNTER, Value: 13,135,456) Master.InodeCacheLoadTimes (Type: COUNTER, Value: 0) Master.InodeCacheMisses (Type: COUNTER, Value: 0) Master.InodeCacheSize (Type: GAUGE, Value: 123,439) Master.InodeLockPoolSize (Type: GAUGE, Value: 123,439) Master.JournalFlushTimer (Type: TIMER, Value: 0) Master.LastBackupEntriesCount (Type: GAUGE, Value: -1) Master.LastBackupRestoreCount (Type: GAUGE, Value: -1) Master.LastBackupRestoreTimeMs (Type: GAUGE, Value: -1) Master.LastBackupTimeMs (Type: GAUGE, Value: -1) Master.ListStatus.UFS:%2Fjournal%2FMetricsMaster.UFS_TYPE:local (Type: TIMER, Value: 15) Master.ListStatus.User:root (Type: TIMER, Value: 2) Master.ListStatus.User:root.UFS:%2Fjournal%2FMetricsMaster.UFS_TYPE:local (Type: TIMER, Value: 0) Master.ListingCacheEvictions (Type: COUNTER, Value: 0) Master.ListingCacheHits (Type: COUNTER, Value: 2) Master.ListingCacheLoadTimes (Type: COUNTER, Value: 0) Master.ListingCacheMisses (Type: COUNTER, Value: 0) Master.ListingCacheSize (Type: GAUGE, Value: 124,005) Master.MountOps (Type: COUNTER, Value: 0) Master.NewBlocksGot (Type: COUNTER, Value: 0) Master.PathsDeleted (Type: COUNTER, Value: 0) Master.PathsMounted (Type: COUNTER, Value: 0) Master.PathsRenamed (Type: COUNTER, Value: 0) Master.PathsUnmounted (Type: COUNTER, Value: 0) Master.PerUfsOpConnectFromMaster.UFS:%2Fjournal%2FMetricsMaster (Type: GAUGE, Value: 0) Master.PerUfsOpCreate.UFS:%2Fjournal%2FMetricsMaster (Type: GAUGE, Value: 0) Master.PerUfsOpGetFileLocations.UFS:%2Fjournal%2FMetricsMaster (Type: GAUGE, Value: 0) Master.PerUfsOpGetSpace.UFS:%2Fjournal%2FMetricsMaster (Type: GAUGE, Value: 9) Master.PerUfsOpGetStatus.UFS:%2Fjournal%2FMetricsMaster (Type: GAUGE, Value: 0) Master.PerUfsOpListStatus.UFS:%2Fjournal%2FMetricsMaster (Type: GAUGE, Value: 15) Master.PerUfsSavedOpGET_FILE_INFO.UFS:%2FunderFSStorage (Type: COUNTER, Value: 13) Master.PerUfsSavedOpLIST_STATUS.UFS:%2FunderFSStorage (Type: COUNTER, Value: 2) Master.RenamePathOps (Type: COUNTER, Value: 0) Master.SetAclOps (Type: COUNTER, Value: 0) Master.SetAttributeOps (Type: COUNTER, Value: 0) Master.TotalPaths (Type: GAUGE, Value: 123,439) Master.UfsJournalInitialReplayTimeMs (Type: GAUGE, Value: -1) Master.UfsSessionCount-Ufs:%2Fjournal%2FMetricsMaster (Type: COUNTER, Value: 0) Master.UfsSessionCount-Ufs:%2FunderFSStorage (Type: COUNTER, Value: 0) Master.UnmountOps (Type: COUNTER, Value: 0) Master.blockHeartbeat.User:root (Type: TIMER, Value: 468) Master.clearMetrics.User:root (Type: TIMER, Value: 0) Master.close.UFS:%2Fjournal%2FMetricsMaster.UFS_TYPE:local (Type: TIMER, Value: 0) Master.commitBlock.User:root (Type: TIMER, Value: 111) Master.getConfigHash (Type: TIMER, Value: 12) Master.getConfigHash.User:root (Type: TIMER, Value: 33) Master.getConfiguration (Type: TIMER, Value: 2) Master.getConfiguration.User:root (Type: TIMER, Value: 103) Master.getMasterInfo.User:root (Type: TIMER, Value: 88) Master.getMetrics.User:root (Type: TIMER, Value: 3) Master.getPinnedFileIds.User:root (Type: TIMER, Value: 468) Master.getUfsInfo.User:root (Type: TIMER, Value: 1) Master.getWorkerId.User:root (Type: TIMER, Value: 0) Master.registerWorker.User:root (Type: TIMER, Value: 0)
I suggest you to repeat the experiment and check Cluster.BytesReadUfsThroughput (Type: GAUGE, Value: 0B/MIN)
. Thanks. @apc999 Any sugggestions.
What is your environment(Kubernetes version, Fluid version, etc.)
Fluid 0.60 alluxioruntime , use lustre as backend storage
Describe the bug
run in pod with dataset bound
time cp alluxio-2.6.2-bin.tar.gz /dev/null
real 1m36.357s user 0m0.014s sys 0m1.047s
the file is about 1.8GB , why read file is so slow for the first time
apiVersion: data.fluid.io/v1alpha1 kind: Dataset metadata: name: lustre spec: mounts:
apiVersion: data.fluid.io/v1alpha1 kind: AlluxioRuntime metadata: name: lustre spec:
Add fields here
replicas: 3 data: replicas: 2
tieredstore: levels:
mediumtype: SSD path: /data/alluxio quota: 150Gi high: "0.9" low: "0.7"
properties:
jni-fuse related configurations
alluxio.fuse.jnifuse.enabled: "true" alluxio.user.client.cache.enabled: "false" alluxio.user.client.cache.store.type: MEMORY alluxio.user.client.cache.dir: /alluxio/ram alluxio.user.client.cache.page.size: 2MB alluxio.user.client.cache.size: 1800MB
alluxio master
alluxio.master.metastore: ROCKS alluxio.master.metastore.inode.cache.max.size: "10000000" alluxio.master.journal.log.size.bytes.max: 500MB alluxio.master.metadata.sync.concurrency.level: "128" alluxio.master.metadata.sync.executor.pool.size: "128" alluxio.master.metadata.sync.ufs.prefetch.pool.size: "128"
alluxio configurations
alluxio.user.block.worker.client.pool.min: "512" alluxio.fuse.debug.enabled: "false" alluxio.web.ui.enabled: "false" alluxio.user.file.writetype.default: THROUGH alluxio.user.ufs.block.read.location.policy: alluxio.client.block.policy.LocalFirstPolicy alluxio.user.block.write.location.policy.class: alluxio.client.block.policy.LocalFirstAvoidEvictionPolicy alluxio.worker.allocator.class: alluxio.worker.block.allocator.GreedyAllocator alluxio.user.block.size.bytes.default: 16MB alluxio.user.streaming.reader.chunk.size.bytes: 32MB alluxio.user.local.reader.chunk.size.bytes: 32MB alluxio.worker.network.reader.buffer.size: 32MB alluxio.worker.file.buffer.size: 320MB alluxio.user.metrics.collection.enabled: "false" alluxio.master.rpc.executor.max.pool.size: "1024" alluxio.master.rpc.executor.core.pool.size: "128"
alluxio.master.mount.table.root.readonly: "true"
alluxio.user.update.file.accesstime.disabled: "true" alluxio.user.file.passive.cache.enabled: "false" alluxio.user.block.avoid.eviction.policy.reserved.size.bytes: 2GB alluxio.master.journal.folder: /journal alluxio.master.journal.type: UFS alluxio.user.block.master.client.pool.gc.threshold: 2day alluxio.user.file.master.client.threads: "1024" alluxio.user.block.master.client.threads: "1024" alluxio.user.file.readtype.default: CACHE alluxio.security.stale.channel.purge.interval: 365d alluxio.user.metadata.cache.enabled: "true" alluxio.user.metadata.cache.expiration.time: 2day alluxio.user.metadata.cache.max.size: "1000000" alluxio.user.direct.memory.io.enabled: "true" alluxio.fuse.cached.paths.max: "1000000" alluxio.job.worker.threadpool.size: "164" alluxio.user.worker.list.refresh.interval: 2min alluxio.user.logging.threshold: 1000ms alluxio.fuse.logging.threshold: 1000ms alluxio.worker.block.master.client.pool.size: "1024" master: jvmOptions: