adobe / da-admin

Apache License 2.0
0 stars 4 forks source link

feat - Enable Word Document Uploads #23

Closed andreituicu closed 4 months ago

andreituicu commented 6 months ago

Description

Allow Docx word uploads to DA, by relying on the:

The Source endpoint will convert the Docx into the semantic HTML and upload it to the DA Storage, together with the images, in the same folder.

example: document.docx

curl -X PUT -F "file=@document.docx" http://localhost:8787/source/andreituicu/da-test/testfolder/document.docx

result:

Motivation and Context

This will allow authors to easily upload into DA word documents created locally, received via email, or started in Sharepoint and continue to edit/preview/publish them in DA.

Note: Will probably require a change in the DA live UI

How Has This Been Tested?

Locally using the word document and curl request from above.

Screenshots (if appropriate):

Types of changes

Checklist:

andreituicu commented 6 months ago

Looks like I need to merge the main branch and catch up with the latest changes from there, but I wanted to open the PR first, now that I have it in a working state and see what you guys think about it. 🙂

codecov[bot] commented 6 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (df11d30) 100.00% compared to head (e53a97e) 100.00%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #23 +/- ## ========================================= Coverage 100.00% 100.00% ========================================= Files 4 4 Lines 219 224 +5 ========================================= + Hits 219 224 +5 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

andreituicu commented 6 months ago

Looks like I need to merge the main branch and catch up with the latest changes from there

Done.

Fixed linting.

Not sure about the code coverage report, it is at 100%, but I didn't add tests for the new functionality that I added. 🙂

auniverseaway commented 6 months ago

This is really great. It also raises some interesting questions about what we want to solve in DA Admin.

I'd be curious what the team (@karlpauls, @mhaack, @trieloff, @chrischrischris) has to say here, too.

I think DA has three migration customers:

  1. AEM WYSIWYG
  2. AEM EDS on SharePoint / Google Drive
  3. Wordpress / Squarespace / Wix / etc.

For me the question is if we want DA Admin to support Word docs through the entire lifecycle of a site or should we treat the Word import as a one-off migration task. I could argue both. I can imagine a world where someone has a Word doc and they just want to drag it into DA and it just works. If you think about this flow, this is what needs to happen:

  1. Download from SharePoint to local computer.
  2. Drag onto DA via @mhaack's D2D implementation.
  3. Delete file off computer.

Is this easier than CMD+A, CMD+C, CMD+V? Probably not, but you also get free asset uploading with this PR. Is that worth the maintenance of the code?

I lean towards thinking this should be a separate project that's used as part of a migration, but I could also be talked into keeping this in DA Admin as a fully supported feature. Either way, I would probably leave this here as a branch until the need arises. I'd prefer we don't merge features before we have real-world use cases that are blocking projects.

@andreituicu do you have a target project in mind that could use this today?

andreituicu commented 6 months ago

do you have a target project in mind that could use this today?

The idea for this feature came from the latest customer project that I am currently working on. (can we name customers here and describe their workflow here in "public"? 🙂 ). I would not pitch this particular customer DA at this stage, but I think they will not be the only one with this feedback.

I think DA has three migration customers:

  1. AEM EDS on SharePoint / Google Drive

I lean towards thinking this should be a separate project that's used as part of a migration

Working on that project I came to the realisation that Sharepoint/Google Drive -> DA is not a one time migration and that it is not actually Sharepoint, but rather Word -> DA. What I've seen working with this specific customer is that they work much more with offline word documents than I would've imagined.

Examples:

These examples are operations which happen day to day, rather than as 1 time migrations, which made me super happy at that time to explain to the customer: "sure, no problem drag and drop that into Microsoft sharepoint and you are ready to continue from there".

The example of receiving a word document via email we saw with @mhaack from a 2nd customer that we weren't able to convince to just share with the original content creators a blank document directly in sharepoint (here DA) where they could simply write their content.

Is this easier than CMD+A, CMD+C, CMD+V

This was actually my first suggestion, before drag and drop, but the feedback I received here was that drag and dropping a word document is preferable to CMD+A, CMD+C, CMD+V, because the first one makes authors feel like things are natively supported and work smoothly vs. the second one that makes them feel like they need to do a transcription from one format to another. The parallel that this specific author made was with the AEM Classic Editor where they are also Copy/Pasting from the word document they already had.

My expectation would've also been that CMD+A, CMD+C, CMD+V is generally easier and for me personally it is, but it looks like other people are more used to the "I have a word document already, I can drag and drop it to upload it", because sharepoint/google drive/dropbox, etc. had made this routine popular. And I can also see the point that as soon as there is more than 1 document, drag and dropping 2+ documents to upload becomes much more appealing than creating empty documents and copy-pasting.

All these made me think that having native support for word documents through the DA Lifecycle is inline with the "Meeting users where they are" principle that we follow.

Actually, the next thing that I was looking at was a "Download as Docx" option to be able to download any DA document as docx to send as email, work offline, etc. 🙂

auniverseaway commented 4 months ago

@andreituicu We've left this sit out here, but I think we finally have a path forward.

@trieloff brought up a good point that adding dependencies is going to affect our cold start times.

What I propose is that we create a new project called da-tools that solves the types of problems you're seeing. I think there's a really good story for this feature, but not at the expense of the speed of da-admin.

I see da-tools expanding, too:

  1. Word conversion
  2. Other converters (MD, JPG, whatever)
  3. Link checking
  4. Etc.

For now, I would close this PR, and when we have a customer that is blocked by this, we ship it in da-tools. You've already done all the hard parts, we just need a customer to say, "I need this now."

WDYT?

andreituicu commented 4 months ago

@auniverseaway Sure! I have no concerns. Closing it for now. I'll leave the branch so we can pickup the code when we need it.