directus / v8-archive

Directus Database API — Wraps Custom SQL Databases with a REST/GraphQL API
https://docs.directus.io/api/reference.html
507 stars 203 forks source link

Add /export and /import endpoints #31

Closed rijkvanzanten closed 5 years ago

rijkvanzanten commented 6 years ago

It would be extremely cool if Directus had a native way of exporting and importing (existing) data and whole Directus setups. Moving from a locally installed Directus to hosted should/could be as easy as hitting export, logging in to hosted, and hitting import.

Different "types" of import/export?:

Thoughts @WellingGuzman @benhaynes ?

benhaynes commented 6 years ago

Absolutely. Being a new feature, let's make sure all the Core/existing features are built first – but then I think this would be an amazing addition to Directus. I've spoken with many clients that cite importing as a big need... and exporting really goes a long way towards our view of data portability.

rijkvanzanten commented 6 years ago

Lets also include support for exporting / importing to JSON. That makes it also pretty easy to maintain the default schema for installation. I was thinking about the following structure:

{
  "<table>": [
    { "<field>": "<value>" },
    { "<field>": "<value>" }
  ]
}

For example, the Directus installation schema / start-data could look like:

{
  "directus_fields": [
    {
      "id": 1,
      "collection": "directus_users",
      "field": "avatar",
      "type": "FILE",
      "interface": "single-file",
      "options": null,
      "locked": 1,
      "etc": "..."
    }
  ]
}

NOTE: This is only for values, not the schema itself

benhaynes commented 6 years ago

Interesting idea. SQL should come first since it's "native" to our storage... but JSON is a nice to have format too! And easier to read and create manually.

rijkvanzanten commented 6 years ago

Preferably it supports SQL (dump), JSON, and CSV formats.

Editing large datasets in Excel / Numbers and then importing them into Directus would be extremely powerful.

WoLfulus commented 6 years ago

I'd love something like this

sedatkaplan commented 6 years ago

importing and exporting database schema will make Directus expansion fast. think of a database schema for accounting, erp or maybe school management etc. and also this will enhance your marketplace.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

benhaynes commented 5 years ago

To achieve better clarity/visibility, we are now tracking feature requests within the Feature Request project board.

This issue being closed does not mean it's not being considered.

philleepflorence commented 5 years ago

I have existing custom CSV import and export endpoints, as most of our clients using Directus have spreadsheets of hundreds of rows of data and also want to export data as CSV to give to accountants and managers and such. Could be on the pipeline soon if I can merge the custom with the core.

https://cdn.philleepflorence.com/uploads/00000000137.jpg is what it looks like in v6.x

benhaynes commented 5 years ago

If we can define the MVP features/options I can draft a design for this. Then we can look into mapping what you come up with into Core (or as an included extension). Thanks @philleepflorence!

philleepflorence commented 5 years ago

Okay, will do!

hemratna commented 5 years ago

We will try to support below mention types.

Different "types" of import/export?:

  • Schema only Just import/export your table setup & corresponding records in directus_* tables
  • Data only Just import/export the data
  • Everything A full database dump. Useful for backups & migrating between servers / hosted-local

DB Export

I am planning to use db-dumper for export to data. This package support MySQL and PostgreSQL

DB Import

I didn't find any well-mentained package for DB import. On the current stage, I am planning to use below command for MySQL mysql -u <username> -p<PlainPassword> <databasename> < <filename.sql>

For PostgreSQL psql -h localhost -U user -W -d database_name -f path/to/file.sql

P.S. Still looking for some better solution.

benhaynes commented 5 years ago

Just adding one more thought here:

A big ask from Enterprise customers has been migrating between environments. So I think it would be nice if our export option had the ability to dump different things:

I guess as long as we can define different "outputs" for export to create, then we should be fine.

rijkvanzanten commented 5 years ago

Another thought: we can't rely on SQL imports for Cloud. Once could inject custom SQL in there and screw up everything. (Which also means we need to have a way to globally disable imports from the config file)

benhaynes commented 5 years ago

Are you recommending we use an alternate format, such as JSON? That will likely be far more difficult to build and will make importing relatively proprietary. It's nice keeping with pure SQL whenever possible... though I recognize that it has security implications if not done correctly.

For Cloud we could make Import an Enterprise-only feature. Would that help since they are isolated?

rijkvanzanten commented 5 years ago

I think it's most flexible / secure to have a dedicated structure for Directus. It allows us to choose what to export / import and it allows us to do some validation on the schema before injecting it. A full database dump can be as easily done directly from the database. The real value that Directus can add here is based on the idea that Directus is able to read and manipulate the to-be-imported data

benhaynes commented 5 years ago

One other point: we might want to consider making this a "module" extension so it's outside of the core logic. There might be many different ways to accomplish this feature... and an extension gives us flexibility and avoids potential breaking changes in the future.

philleepflorence commented 5 years ago

Some notes: for importing into a shared environment, the import can be allowed only if the user is already in the DB and is an admin. I can share some logic on this. The only thing we do not allow is fresh import by an anonymous user. If the user is an admin and authenticated there are ways to only import into the same DB and/or "project".

Making it an extension is definitely more efficient and safer than adding to the core.

hemratna commented 5 years ago

@philleepflorence As you suggested, the /import and /export endpoints are admin only access similar to the setting.

mattgaspar commented 5 years ago

I wanted to recommend an approach based on experiences with some other systems.

Primary method of import/export is done with CSV files. CSV works very well because it is well supported by many applications/services and the data can be utilized by anyone with a spreadsheet editor.

Exporting Any list view or search result can be exported to CSV by clicking an icon. This is nice because it allows for exporting all or only the specific data that is needed. So for Directus, the filtered view would export the data returned from the filtered search and only include the columns that are selected. Also include the database record id.

Importing The first step of the import process you select the database table and upload the CSV file. When importing you have the option to process the CSV data as Create, Update or Both (for "both", if the id column on the CSV row is not blank it is an update otherwise it is an insert). All inserts or edits from the csv import are processed similar to if they were done through other methods (UI or API). Meaning they trigger validation and/or other events setup on the table. So basically with Directus, this would mean processing each row of the CSV as if it was a request from the API and call any actions and filters for the collection. (this could be a user choice whether they want these applied to the import or not) This would also run the import using the permissions of the current user so importing could be allowed for any user.

For error handling, the entire csv is processed and if any rows failed, a csv result is generated which only includes the rows that failed and a reason column is added. This allows for easily fixing the bad data and re-attempting only those records.

benhaynes commented 5 years ago

Thank you @mattgaspar — this is a very clear and concise description of an ideal import/export flow... I love it! I've added this as a priority feature since I think it's one of the more useful things for all users to have.

One thing that would need to consider is how relational data is imported/exported. I've drafted a rough design using the upcoming Page Info Sidebar Components, but this could also be an Action Button in the page header that opens a modal:

Directus 8 – Browse Info Sidebar — Import Export

mattgaspar commented 5 years ago

Looks good!

Some other functionality that would be useful for the import to make it user friendly:

After clicking import, read the first row of the CSV which defines the column names (could be server side or possibly done on the client but could be problematic for very large files - some simple validations of the csv file could also be done here). Allow the user to select which columns they want to import and allow mapping the csv column name to the field in Directus. It could provide default mappings if the column names match fields in the collection.

I find this type of functionality useful for doing mass updates. I can export the data set, update in Excel and then only import a specific column without needing to remove the extra columns from the csv.

For "many to one" fields you'll have to decide if you want to allow the user to use text which would require a query to identify the id to use. My preference is that the mapping interface allows you to choose. But there might need to be a way to include the id's for these fields when exporting.

mattgaspar commented 5 years ago

Just thought of another issue that will probably be the trickiest part to dealing with csv files. Every value in the csv file is basically a string. So based on the field type, each value will need to be converted to the correct data type. For dealing with arrays you can treat the cell value as it's own comma separated list!

philleepflorence commented 5 years ago

@mattgaspar can you send me a sample CSV of what you mean, I can test against my CSV import extension/function I installed on v6.x. I think it accommodated for escaped quotes but it would help to test so the conversion method can be updated. I think the only issue I had was the Alias columns, it exported INT values, it was tricky getting the relationship data back into the DB if importing.

philleepflorence commented 5 years ago

One thing that would need to consider is how relational data is imported/exported. I've drafted a rough design using the upcoming Page Info Sidebar Components, but this could also be an Action Button in the page header that opens a modal:

The way I did in the last version was to export the raw alphanumeric values, as the user would need to export the junction collection(s) for it to make sense in a spreadsheet and also to facilitate re-importing.

Also, are we going the route of making this an admin endpoint, an extension, or a core feature accessible from any collection? I think the admin endpoints would make it more of a module - vue component - so it is easy to debug or update.

I can port the v6 CSV extension over if @benhaynes is down to collaborate on wire-framing and storyboarding.

mattgaspar commented 5 years ago

@philleepflorence does your CSV import functionality operate against the database directly or does it process the update through Directus core so that permissions for each row are verified and any hooks are called?

philleepflorence commented 5 years ago

@mattgaspar it works through the Directus Zend core via the authenticated user's credentials with a depth of 0; It simply converts the JSON to CSV after retrieving the response.

The same goes for imports.

That way it does not matter the actual DB type!

mattgaspar commented 5 years ago

@philleepflorence sorry if I'm off base, i'm new to using Directus and I don't know much about Zend but I thought it was mostly used to generate the sql queries. Are the role based permissions implemented in the Zend component? I didn't think they were implemented at the database level. I'm referring to the status based workflow permissions.

philleepflorence commented 5 years ago

@mattgaspar oh okay, yeah it uses the Directus core DB function to fetch or update items, i.e. just like using the API.

benhaynes commented 5 years ago

Hey @mattgaspar — we would definitely want all imports/exports to go through the full API so that permissions (etc) are honored.

Just thought of another issue that will probably be the trickiest part to dealing with csv files. Every value in the csv file is basically a string. So based on the field type, each value will need to be converted to the correct data type.

This shouldn't be an issue, since we'll confirm column/field mapping... and each field has a set datatype in Directus. That means that we'll know how to cast each value (eg: string, int).

@philleepflorence — I see this as a core set of API endpoints, that can then be used directly on the Core App listing pages... and could also have a dedicated module (extension) for power-user or global exports/imports. Does that make sense? @rijkvanzanten @bjgajjar do you agree?

philleepflorence commented 5 years ago

@benhaynes it makes sense...although it might complicate things as opposed to a strict module for admins or those with permissions to that endpoint. If part of the core would anyone with read access - permission be able to export? I know import would be update or create - permissions access. But I would allow @rijkvanzanten and @bjgajjar to comment first since they would know the intricacies of the other endpoints.

If an admin endpoint, that is something I can easily tackle over a weekend since I already have most of the code for the CSV extension.

benhaynes commented 5 years ago

True, an admin page would be easier to build... but I think there are a lot of users who would take advantage of an import/export feature. And there's no reason to limit it to Admins if we can have everything safely run through permissions. Also, we would want to avoid direct SQL queries since those don't trigger our accountability logic (change log).

Perhaps we can apply a similar approach to the endpoints so this is a solid extensible feature.

binal-7span commented 5 years ago

@bjgajjar do you agree?

LGTM. Let @rijkvanzanten provide his thoughts :)

rijkvanzanten commented 5 years ago

Having it as an option in the item browse page's sidebar is more useful

benhaynes commented 5 years ago

I agree — if the users have permission to access (READ) the data, there's no reason not to let them export it in alternate formats. Similarly, if they have CREATE/UPDATE permissions, they should be able to import data. This is a very useful feature for average users who might have Spreadsheets (or other external apps that save in CSV) as part of their workflow.

lluishi93 commented 5 years ago

In fact, this would be a nice and easy way to migrate data from other systems (because most of them already have similar functionalities)

yagobski commented 4 years ago

Any progress on the view side of this feature. We are a lot of users of Directus waiting for an Export to CSV solution

rijkvanzanten commented 4 years ago

No new updates yet. I know there's a lot of people who want this, but there's also a lot of people who want bugs fixed, or want other features, ~or want me to get them coffee~ want better integrations etc etc. We're doing what we can with the resources we have, but I have to prioritize bugs, general stability, and adding tests before anything else.

@yagobski if you wanna help out, it would be great if you could write out a spec (rfc) for how this would ideally work for your use case 😊 (as detailed as possible).

philleepflorence commented 4 years ago

In the meantime until added to the core there are extensions for CSV import and export here:

https://github.com/philleepflorence/directus-8-extensions

yagobski commented 4 years ago

Thanks, @philleepflorence I tried your extensions but don't seem to work with directus-8. the best case for me is just a button Export to CSV near the table view that will export the data of columns that I see on the table. It means if apply filtering or ordering that will be the exact data what i get on CSV.

benhaynes commented 4 years ago

Another reason this feature isn't getting AS much attention as it should, is that you can technically work around it in the database directly (since Directus is pure SQL). Other features/bugs are Directus specific and only work if we fix/add them... but this can be done with a database administration tool (less convenient, I know). Just trying to help add some context here.

darioseidl commented 4 years ago

Is it okay, if I ask here? I'm setting up Directus for one of our projects and want to share the schema with other devs.

Schema only Just import/export your table setup & corresponding records in directus_* tables

It's fine for me to do that with mysqldump, but I'm not quite sure which directus_ tables would be considered part of the schema, and which are "data"? Has anyone worked that out yet?

So far, I would dump all directus tables with data, except for directus_activity, directus_files, directus_revisions, directus_user_sessions which I would dump with mysqldump --no-data:

# make a dump of directus tables
mysqldump db directus_collection_presets directus_collections directus_fields directus_folders directus_migrations directus_permissions directus_relations directus_roles directus_settings directus_users directus_webhooks > schema/db-schema-dump.sql

# make a dump of directus tables, without data
mysqldump --no-data db directus_activity directus_files directus_revisions directus_user_sessions >> schema/db-schema-dump.sql

# make a dump of all non-directus tables, without data
mysqldump --no-data db $(mysql --batch --silent --execute "SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = 'db' AND TABLE_NAME NOT LIKE 'directus\_%';") >> schema/db-schema-dump.sql
rijkvanzanten commented 4 years ago

@darioseidl While the exact split between "data" and "schema" is a little vague ,this is a rough overview (all collections prefixed by directus_:

Collection Description
activity Activity log; who did what when
collection_presets Last state of layouts and saved bookmarks
collections What collections exist / are managed
fields Fields and their configuration
folders Folders to group files *
migrations Keeps track of what migrations have run
permissions What the user roles are allowed to access
relations How certain fields relate to eachother
revisions Stores item deltas on crud operations
roles What user roles exist in the platform
settings key/value of general settings
user_sessions What users are current logged in
users Users and their login info
webhooks Webhooks and when they fire

* Not yet in use

As for which ones to migrate, I'd generally recommend all of them in order to prevent mismatches in foreign keys between the tables. That being said, collection fields, and relations are definitely required for "schema" stuff (without those, you'd lose info like what interface is used, order on the form etc).

users / roles / permission could be truncated, but you'd have no way to login, so make sure you have at least 1 active user and the admin/public roles present in roles

The others can technically be empty and still have directus work with the schema in place.

darioseidl commented 4 years ago

Thanks @rijkvanzanten ! That helps me a lot to understand the tables.

I was worried about mismatched foreign keys, but so far the above script (skipping the data in directus_activity, directus_files, directus_revisions, directus_user_sessions) has worked well for me.

ThaDaVos commented 4 years ago

As the core will be moved to Laravel, maybe this can help: https://docs.laravel-excel.com/3.1/getting-started/

I was also hoping something a like would be implemented already - currenlty using Directus for translations only and using the API endpoints to extract and then transform them to a i18n compatible layout - next step, do it in reverse :joy:

sweetyams commented 4 years ago

+1 for this feature - would be extremely helpful!

Russ93 commented 4 years ago

Has there been any work done on this? Would be very helpful to export the data on the screen to a tsv or a csv.

benhaynes commented 4 years ago

This has been added in v9, but will not be back-ported to v8 (unless done so by the community).

tchapi commented 4 years ago

Hi @benhaynes Do you have any roadmap for the v9, where that import/export feature is included ?

Thanks a lot !

benhaynes commented 4 years ago

There's no roadmap yet, but we're working on adding one. Export and backup are already in v9, import might come a bit later.