[2D MVP] Milestone 2 - Process drawing files and store them as images

carmenfan commented 1 month ago

Description

This is part of https://github.com/3drepo/3D-Repo-Product-Team/issues/378

We need to actually process the drawings and convert them into svg/pngs.

End points

Get image file

endpoint: GET /teamspaces/{teamspace}/projects/{project}/drawings/{drawing}/revisions/{revision}/files/image description: Get the processed image of the 2d drawing permissions: viewer+ response: binary of the svg/jpg/png

NOTE: The image file could be svg or any other image file format - so it's important to make sure we get the mimeType correct to communicate to the frontend what file format this is.

Get drawing thumbnail

endpoint: GET /teamspaces/{teamspace}/projects/{project}/drawings/{drawing}/revisions/{revision}/files/thumbnail description: Get the processed thumbnail of the 2d drawing permissions: viewer+ response: binary of the png/jpg

NOTE: The image should be the processed, small version of the drawing that will satisfy the resolution required for the drawing list (check figma)

Processing Drawings

Drawing processing follow a similar process to 3D files processing, apart from the PDF will be handled by the NodeJS web service directly.

flowchart TD

    subgraph Queue Service
        Rabbitmq(callbackq)
    end
    subgraph API Service
        B{File type?}
        route[Upload revision endpoint]
        C[Extract the 1st page of the PDF]    
        PDFSuccess{Success?}
    end

    subgraph Model Processing Service
        D[Place task in queue]
    end

    subgraph bouncerworker
    onTask[New task from queue]
    bouncerProcess(Convert DWG to SVG)    
    bouncerComplete{Success?}    
    end

    User((User)) -->|Upload New Revision| route
    B -->|DWG| D
    route --> B
    B -->|PDF| C
    C --> PDFSuccess
    PDFSuccess -->|Status: ok| Rabbitmq        
    D --> onTask[New task from queue]
    onTask --> bouncerProcess
    bouncerComplete -->|Status: failed| Rabbitmq
    bouncerComplete -->|Status: ok| Rabbitmq
    bouncerProcess --> bouncerComplete
    C -->|Status: processing| Rabbitmq
    D -->|Status: queued| Rabbitmq
    bouncerProcess --> |Status: processing| Rabbitmq

PDF processing

PDF processing should happen in NodeJS instead of putting it to the queue and have it serviced by 3drepobouncer. This is due to the abundance of pdf reading libraries available in NodeJS that will allow us to process a file at ease. There are many libraries available that does slightly different things so there will need to be a technical analysis on which one best suit our needs. Found the following from a quick google:

pdf extractor
pdf2pic
inkscape
and potentially more!

We will only be dealing with the first page of the PDF, and ideally we want to convert that into an SVG. Failing so, it should be a raster image (e.g. png, jpg)

Failling all that, the drawing should be considered failed to process and return the error with an appropriate reasoning to the user.

NOTE: a failed revision should also be recorded, with the original file stored (So we get closer to https://github.com/3drepo/3D-Repo-Product-Team/issues/446)

See data storage section to determine where files and references should be stored.

DWG processing

DWG processing will require the ability to read DWG, which lies within 3drepobouncer. This should follow a similar process to 3d models where the task is submitted to the queue for bouncer workers to pick up.

Slight variation is that instead of putting the drawing file to the common sharespace, ideally we want to first commit the revision record (noted as incomplete), and just give the revision ID as the reference. The boucner should pull the file from the fileshare instead. (this should hopefully get us closer to doing (https://github.com/3drepo/3D-Repo-Product-Team/issues/446)

See data storage section to determine where files and references should be stored.

I'm not sure if we should put this task in modelq, or jobq. This very much depends on the amount of memory and time it takes to process. @sebjf perhaps once we have an implementation we can run through a few files of various sizes and see what the memory/cpu utilisation is like?

Data storage

Revision records

Image reference should be sored as part of the revision record in drawings.history:

// Example drawing revision
{
    "_id" : LUUID("cad0c3fe-dd1d-4844-ad04-cfb75df26a63"),
    "project": LUUID("bfe0c3fe-dd1d-4844-ad04-cfb75df26a63"),
    "model": LUUID("aeb0c3fe-dd1d-4844-ad04-cfb75df26a63"),
    "author" : "carmen",
    "statusCode" : "S1",
    "revCode": "abc,
    "timestamp" : ISODate("2023-11-28T19:11:55.000Z"),
    "format": "dwg",
    "rFile" : [ 
        LUUID("cad0c3fe-dd1d-4844-ad04-cfb75df26a63")
    ],
    "image": {

        LUUID("cad0c3fe-dd1d-4844-ad04-cfb75df26a63")
    }

}

NOTE:

Unlike containers, every drawing within a teamspace will be stored in the same collection, with project/model properties to denote which project/model they belong to.
Ensure we add to the delete drawing function to remove revisions
rFile should be links to original files, which should be a reference to a file. The reference should be stored in drawings.history.ref
- the original filename should be a metadata within the reference, instead of the <UUID>_original_file_name approach taken in containers.
- We will record the file extension in the object instead of having to regex it out of the rFile string like in containers.
- This will also contain the incomplete and void property like container to denote the state of the model processing.

Image file

The processed image file should be stored as part of the revision and also be stored in drawings.history.ref

// drawings.history.ref
{
    "_id" : LUUID("0ac71fb2-a18d-4ef8-b5a8-b87879326300"),
    "type" : "fs",
    "link" : "197/204/f959eda6-3fde-436a-ba0e-b1aa777a413d",
    "size" : 2419670,
    "mimeType": "image/svg+xml", 
    "project": LUUID("bfe0c3fe-dd1d-4844-ad04-cfb75df26a63"),
    "model": LUUID("aeb0c3fe-dd1d-4844-ad04-cfb75df26a63"),
    "rev_id" : LUUID("cad0c3fe-dd1d-4844-ad04-cfb75df26a63"),
}

(sebjf changed rid to rev_id)

Goals

[ ] As a viewer+ I want to be able to get an image representation of the drawing container

Tasks

tbd

sebjf commented 1 month ago

Hi @carmenfan,

Regarding the revision schema, is revCode the same thing as tag? (I am thinking about model and drawing revisions sharing a base class if possible.)

carmenfan commented 1 month ago

Hi @carmenfan,

Regarding the revision schema, is revCode the same thing as tag? (I am thinking about model and drawing revisions sharing a base class if possible.)

yes and no. It serves a similar purpose, but with a different validator. It can share the same base class I think but tag will never be stored for drawings

For the purpose of drawings I want to tweak the responsibility a bit (and something that I want the container to follow as well). I'm thinking that the backend should be creating the revision history entry, and bouncer is just updating it for whatever purpose (e.g. appending the reference to the image)

So the message coming in from the queue wil tell you the teamspoace/project/drawing id and revision id, you'd fetch the drawing by reading the revision entry that already exists. So it no longer become bouncer's responsibility for writing all these stuff.

sebjf commented 1 month ago

Hi @carmenfan, in the schema above rFile is an array. Are we expecting multiple drawings per revision, and if so, should the importer create one entry in image for each in rFile (under the same uuid?)

carmenfan commented 1 month ago

Hi @carmenfan, in the schema above rFile is an array. Are we expecting multiple drawings per revision, and if so, should the importer create one entry in image for each in rFile (under the same uuid?)

as it stands we'd only expect 1 drawing per upload so 1 image

I did contemplate about this, 3d container store it as an array (in anticipation of texture files or different file formats etc), and I decided to just copy so if in the future we want to do something similar, then we don't have to change the type in the schema.

sebjf commented 1 month ago

Hi @carmenfan, bouncer contains the definition REPO_NODE_REVISION_ID as "rev_id". Do you want a new definition for "rid" or shall we change the (ref) schema above to use "rev_id"?

3drepo / 3drepo.io