CARTAvis / carta

To CARTA users, this repo holds the CARTA release packages. Please use this repo to log bugs and feature requests. These will be triaged by the development team and prioritised as necessary in the development cycles.
19 stars 0 forks source link

Long load times in directories with many files #1

Closed low-sky closed 2 years ago

low-sky commented 4 years ago

CARTA has a long load time when I call CARTA (v1.2) in a directory with a large number of files (>100), but I'm calling it with a specific file from the command line, e.g.,

carta file0001.fits

The long load time appears to come from running the scan across all the files in directory before loading. Is it possible to skip the scan when called from the command line in this fashion?

veggiesaurus commented 4 years ago

@low-sky does this directory have lots of image files? @pford is working on changes that should speed this scenario up dramatically, because we won't open each FITS file and detect the HDU list and HDU types, but will instead just read the magic number (SIMPLE = T) to determine if they are FITS.

You can track progress of this feature here (backend) and here (frontend)

low-sky commented 4 years ago

Sorry to be unclear: yes, this happens for directories with lots of image FITS files.

veggiesaurus commented 4 years ago

Expected to be included in 1.4 release

keflavich commented 3 years ago

This remains my main pain point; I've frequently run CARTA sessions into the ground by accidentally (or necessarily) navigating to a folder that contains too many files.

I see that there's a lot of progress on the linked PRs, but I don't understand the details. Could a dev perhaps provide a user-facing update on the status of the file listing improvements?

Note also that https://github.com/CARTAvis/carta-backend/issues/431 is closely related.

kswang1029 commented 3 years ago

The improvements include:

Would these be sensible?

kswang1029 commented 3 years ago

@keflavich Just curious, would you mind if we just show all files without file type parsing at the file list request level? The actual type parsing happens at the file info request level. This has implications to UX. 🤔

keflavich commented 3 years ago

I would greatly prefer if we just saw the filenames and sizes (simple ls -lh output) rather than the metadata; I almost never use the metadata, and I certainly never look at the metadata for all files in a folder.

Is the UI you're proposing, say, list all files, then once a user clicks on the file, determine whether or not it can be opened? I'd be happy with that for sure.

I also liked the suggestion I saw in one thread of being able to select files by type. I would love to filter by suffix, e.g., show only image.tt0 or only .residual or only .psf files, say, especially if that meant the folder would load faster!

veggiesaurus commented 3 years ago

I would greatly prefer if we just saw the filenames and sizes (simple ls -lh output) rather than the metadata; I almost never use the metadata, and I certainly never look at the metadata for all files in a folder.

I think we should add a preference "show all files in file browser". This would make the file list very quick (although we'd still need to do some checks for folder based image formats) but would list all files, rather than all supported files

keflavich commented 3 years ago

Right, yes, the difference between "folders that are files" and "folders that should be browsed to" makes the problem trickier than I implied! Still, just showing them all, then deciding later if they're images or not, would be nice.

veggiesaurus commented 2 years ago

fixed in upcoming 3.0-beta.2 release

keflavich commented 2 years ago

I'm using 3.0-beta.2, and with file list set either to "All Files" or "Filter by extension", it is still very slow to show all files. I see that, independent of filtering technique, it still shows the size of all the files in GB - that means it must be doing some sort of file size inspection. Is there any way to turn that off? This is my main bottleneck in using CARTA right now; I have to wait ~30s-few minutes every time I want to load a new file.

veggiesaurus commented 2 years ago

I'm using 3.0-beta.2, and with file list set either to "All Files" or "Filter by extension", it is still very slow to show all files. I see that, independent of filtering technique, it still shows the size of all the files in GB - that means it must be doing some sort of file size inspection. Is there any way to turn that off? This is my main bottleneck in using CARTA right now; I have to wait ~30s-few minutes every time I want to load a new file.

This is curious. We're basically just using stat to get file info (file size, last modified etc) when filtering by extension or showing all files. I'm not really sure how to speed that process up. I'm not sure how this could be significantly slower than ls -lh. In most of our tests, 25K files on an old hard drive was no problem whatsoever.

Can you remind me again what sort of filesystem you're using?

keflavich commented 2 years ago

OK, that's interesting - stat * is nearly instantaneous in a ~500 image directory. But it took >30s on a 40,000-file directory.

That same ~500-image directory takes nearly a minute to load in CARTA.

The filesystem is lustre-based. It's not very high-performance, and the support team specifically encouraged me to avoid / limit the use of ls -lh (as opposed to ls) when possible.

veggiesaurus commented 2 years ago

OK, that's interesting - stat * is nearly instantaneous in a ~500 image directory. But it took >30s on a 40,000-file directory.

That same ~500-image directory takes nearly a minute to load in CARTA.

The filesystem is lustre-based. It's not very high-performance, and the support team specifically encouraged me to avoid / limit the use of ls -lh (as opposed to ls) when possible.

Ok. I think we'll have to do some additional benchmarking with Lustre (@ajm-asiaa perhaps you could do so?). I've just tested on our CephFS remote filesystem, and 25000 files showed up in CARTA within 1 second :thinking: A minute to a file list of 500 files seems way out of the ordinary.

Jordatious commented 2 years ago

I also experience this problem on ilifu. I just tested again on carta-testing.

veggiesaurus commented 2 years ago

I also experience this problem on ilifu. I just tested again on carta-testing.

can you compare this to the time it takes to run ls -lh? Note that things might be cached after you ls them once

kswang1029 commented 2 years ago

could it be because the disk arrays were hibernated so it took time to wake up? I did a quick test with ASIAA's lustre and with 25000 mixed files and folders, within a second the list showed up.

ajm-ska commented 2 years ago

We have a convenient test folder on our Lustre system containing 25012 CASA images of 106kB each. The CARTA Filebrowser took 4 minutes 35 seconds to process them. At least it was showing "Loading file list" progress, otherwise I would have thought it was frozen. The second time I opened the folder, it did indeed process faster, only taking about 30 seconds.

If I just use the terminal, ls -lh only takes ~5 seconds before all the files are listed. The stat * command is instantaneous, but it takes about 15 seconds for all the information to be printed on the screen.

ajm-ska commented 2 years ago

The Lustre lfs getstripe command shows the images in our test folder have a "stripe_count" of 1 as expected as they are tiny files in this case. Each image has a different "stripe_offset" so I imagine they are being read in from different OSTs. Although I wonder if larger files that are composed of multiple stripes could be processed faster.

Our Lustre system seems to be functioning fine as the lfs check servers command shows all OSTs are active.

veggiesaurus commented 2 years ago

We have a convenient test folder on our Lustre system containing 25012 CASA images of 106kB each. The CARTA Filebrowser took 4 minutes 35 seconds to process them. At least it was showing "Loading file list" progress, otherwise I would have thought it was frozen. The second time I opened the folder, it did indeed process faster, only taking about 30 seconds.

If I just use the terminal, ls -lh only takes ~5 seconds before all the files are listed. The stat * command is instantaneous, but it takes about 15 seconds for all the information to be printed on the screen.

Did you have your Carta front-end set to filter by file extension rather than content type?

ajm-ska commented 2 years ago

Did you have your Carta front-end set to filter by file extension rather than content type?

No. It was initially set to Filter by file content. After changing it to Filter by extension, it now takes 20 seconds (from the cached state). So it is about 10 seconds faster.

ajm-ska commented 2 years ago

Our ASIAA CARTA public demo server originally mounted all its test images from our Lustre system. That has since changed and now only uses a small internal HDD array in ext4 format. We don't have space to put all the original test images there. But just as an experiment, I just copied over the folder with the 25012 test images (set_lotsFiles2). It takes about 5 seconds to process from the HDD array! So there definitely seems to be poor performance when using Lustre with CARTA. The strange thing is, I don't remember it being so slow with Lustre before.