apache / incubator-baremaps

Create custom vector tiles from OpenStreetMap and other data sources with Postgis and Java.
baremaps.apache.org
Apache License 2.0
491 stars 56 forks source link

Improve naming in data frame abstraction #857

Closed bchapuis closed 1 month ago

bchapuis commented 1 month ago

This is an attempt at improving the data frame abstraction. The following diagramm describes the overall architecture.

classDiagram
    class DataColumn {
        +String name()
        +Type type()
    }
    class DataSchema {
        +String name()
        +List<DataColumn> columns()
        +DataRow createRow()
    }
    class DataRow {
        +DataSchema schema()
        +List<?> values()
        +Object get(String column)
        +Object get(int index)
        +void set(String column, Object value)
        +void set(int index, Object value)
        +DataRow with(String column, Object value)
        +DataRow with(int index, Object value)
    }
    class DataTable {
        +DataSchema schema()
        +boolean add(DataRow row)
        +void clear()
        +long size()
        +Iterator<DataRow> iterator()
    }
    class DataStore {
        +List<String> list()
        +DataTable get(String name)
        +void add(DataTable table)
        +void add(String name, DataTable table)
        +void remove(String name)
    }
    DataStore --> DataTable : has
    DataTable --> DataRow : has
    DataTable --> DataSchema : follows
    DataRow --> DataSchema : follows
    DataSchema --> DataColumn : has
bchapuis commented 1 month ago

@sebr72 @Drabble I'm trying to refactor the org.apache.baremaps.data.schema package. Naming these classes is quite hard and I also struggle with the package name.

I think the new naming convention (see diagram) is a bit less confusing. What do you think?

Regarding the module and package names, I hesitate between many alternatives dataframe, datastore, storage, etc. The difficulty is that the abstraction can be used for in-memory, off-heap, and on-disk collections/dataframes. It is also use to access many files (geopackage, geoparquet, shapefile, etc.) in a uniform way.

Here are a comple of alternatives (module / package):

Any idea or suggestion?

sebr72 commented 1 month ago

@bchapuis : Following our conversation, I would suggest we keep DataTable rather than Frame. As well as ...data.schema becomes ...data.storage

bchapuis commented 1 month ago

@sebr72 I can't add you as a reviewer for now, but let me know if you think this refactoring corresponds to the changes we discussed yesterday.

sonarcloud[bot] commented 1 month ago

Quality Gate Passed Quality Gate passed

Issues
14 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud