cynkra / dm

Working with relational data models in R
https://dm.cynkra.com
Other
497 stars 50 forks source link

Explore dm/Airtable links #1224

Open maelle opened 2 years ago

maelle commented 2 years ago

Working on Airtable reminded me of dm, wondering whether that's an interesting use case.

krlmlr commented 2 years ago

Thanks, I wonder how the relationships are stored? Can we infer them?

See https://github.com/cynkra/dm/pull/485 for context.

maelle commented 2 years ago

There are IDs so I think some inferring should be possible.

Should I create a test Airtable then download all CSVs to create an example?

krlmlr commented 2 years ago

Sure, that would be very useful, also for the conf!

maelle commented 2 years ago

also noting Airtable has "views" (persistent filters etc) it somehow reminds me of zooming

maelle commented 2 years ago

When downloading CSVs via the Airtable interface, one does not get record IDs unless there's a special field added (with the formula RECORD_ID()). Links to other records are transformed to the name of that other record even if that name is not unique.

To get a more usable download one needs to use Airtable API.

krlmlr commented 2 years ago

What does the download via API look like?

maelle commented 2 years ago

Based on a test Airtable (an Airtable example that Airtable provides)

airtable <- airtabler::airtable(
  base = "BASEID", 
  tables = c("Places", "Attractions")
)
airtable$Places$select_all() |>
  glimpse()

airtable$Attractions$select_all() |>
  glimpse()
Rows: 3
Columns: 8
$ id                        <chr> "rec9Mz5uGDD8HPeYL", "recLa53iEHTHFaI…
$ Highlights                <list> <"Food", "Beer", "Art", "Music">, <"…
$ City                      <chr> "Austin", "New Orleans", "Tokyo"
$ Priority                  <chr> "Medium", "Low", "High"
$ `What I want to do there` <list> <"recmoxftWfsSxvZGh", "rect2CiDH4ueYk…
$ `Weekend Trip`            <lgl> TRUE, TRUE, NA
$ International             <lgl> NA, NA, TRUE
$ createdTime               <chr> "2015-07-12T15:52:41.000Z", "2015-07-…

and

Rows: 4
Columns: 6
$ id          <chr> "recfvYDdHjSv7v416", "recj4FckEmh76k4DP", "recmoxft…
$ Attraction  <chr> "Tokyo National Museum", "Jazz Fest", "Hamilton Poo…
$ Links       <chr> "http://www.tnm.jp/?lang=en", "http://www.nojazzfes…
$ Place       <list> "rechoERJqEUlwQZkM", "recLa53iEHTHFaIPS", "rec9Mz5…
$ Notes       <chr> "The TNM is the oldest Japanese national museum (es…
$ createdTime <chr> "2015-08-19T19:01:28.000Z", "2015-08-19T19:01:28.00…
maelle commented 2 years ago

Also how neat would it be to have an R package that, given an Airtable ID and table names (not possible to guess table names as Airtable metadata API is a paid product), would dump the whole thing in some sort of SQL database :thinking: (I'm doing an Airtable backup for a non cynkra thing -- for which I use csvs)

krlmlr commented 2 years ago

We might be able to extend airtabler to generate an in-memory dm object. If CSVs are too large, they could be mounted into a duckdb. From there we can export to any database.

Why is Attractions$Place a list? Is the Places$"What I want to do there" a reverse link?