Closed delucchi-cmu closed 2 months ago
Attention: Patch coverage is 99.10714%
with 1 line
in your changes missing coverage. Please review.
Project coverage is 93.75%. Comparing base (
047600e
) to head (6cf6d84
).
Files | Patch % | Lines |
---|---|---|
src/hipscat/io/file_io/file_pointer.py | 90.00% | 1 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Before [047600e6] <v0.3.7> | After [b3ae0e21] | Ratio | Benchmark (Parameter) |
---|---|---|---|
19.4±0.3ms | 19.4±0.4ms | 1.01 | benchmarks.MetadataSuite.time_load_partition_info_order6 |
378±4ms | 383±4ms | 1.01 | benchmarks.Suite.time_outer_pixel_alignment |
41.2±0.7ms | 41.8±0.7ms | 1.01 | benchmarks.Suite.time_pixel_tree_creation |
119±0.4ms | 120±0.5ms | 1.01 | benchmarks.time_test_alignment_even_sky |
77.5±0.9ms | 77.7±0.9ms | 1 | benchmarks.MetadataSuite.time_load_partition_info_order7 |
78.0±1ms | 77.6±0.7ms | 0.99 | benchmarks.MetadataSuite.time_load_partition_join_info |
89.4±2ms | 88.4±2ms | 0.99 | benchmarks.Suite.time_paths_creation |
13.3±0.4ms | 13.1±0.3ms | 0.98 | benchmarks.Suite.time_inner_pixel_alignment |
1.00±0.02ms | 971±5μs | 0.97 | benchmarks.time_test_cone_filter_multiple_order |
Click here to view all benchmarks.
I tried to run it
import gdrivefs
import lsdb
gdfs = gdrivefs.GoogleDriveFileSystem(token='cache', root_file_id='1mocyakfy_8OgFGOIQ813S7POqwdDtfX_')
lsdb.read_hipscat('', file_system=gdfs)
it failed with
FileNotFoundError: [Errno 2] No such file or directory: '/Users/hombit/projects/lincc-frameworks/lsdb/catalog_info.json'
While debugging I found that file_system
is None
in read_from_metadata_file()
I tried to run it
import gdrivefs import lsdb gdfs = gdrivefs.GoogleDriveFileSystem(token='cache', root_file_id='1mocyakfy_8OgFGOIQ813S7POqwdDtfX_') lsdb.read_hipscat('', file_system=gdfs)
it failed with
FileNotFoundError: [Errno 2] No such file or directory: '/Users/hombit/projects/lincc-frameworks/lsdb/catalog_info.json'
While debugging I found that
file_system
isNone
inread_from_metadata_file()
The GDFS implementation is weird, and I've found that I can't reference files if I pass the hipscat catalog as the root_file_id
. If you have some directory structure in google drive, like the following:
└── [file_id=1000] hipscat/
└── [file_id=0100] catalogs/
├── [file_id=0101] catalog_a/
└── [file_id=0102] catalog_b/
Then you can do something like
gdfs = gdrivefs.GoogleDriveFileSystem(token='cache', root_file_id='1000')
lsdb.read_hipscat('catalogs/catalog_a', file_system=gdfs)
or
gdfs = gdrivefs.GoogleDriveFileSystem(token='cache', root_file_id='0100')
lsdb.read_hipscat('catalog_a', file_system=gdfs)
I don't know why this is the case.
@delucchi-cmu it still doesn't work for me =(
import gdrivefs
import lsdb
gdfs = gdrivefs.GoogleDriveFileSystem(token='cache', root_file_id='17_8v782e6kK22hAJ_p1AzHXmjLpKFT4w')
lsdb.read_hipscat('gaia_dr3_pm_greater_100', file_system=gdfs)
fails with
FileNotFoundError: [Errno 2] No such file or directory: '/Users/hombit/projects/lincc-frameworks/lsdb/gaia_dr3_pm_greater_100/catalog_info.json'
While i can do gdfs.open("gaia_dr3_pm_greater_100/catalog_info.json").read()
Change Description
Closes #307
Solution Description
Passes any user-provided
file_system
object along to fsspec calls.Code Quality