3drepo / 3drepo.io

3D Repo web server
http://www.3drepo.io
GNU Affero General Public License v3.0
94 stars 38 forks source link

[2D MVP] Milestone 2 - Process drawing files and store them as images #5006

Open carmenfan opened 1 month ago

carmenfan commented 1 month ago

Description

This is part of https://github.com/3drepo/3D-Repo-Product-Team/issues/378

We need to actually process the drawings and convert them into svg/pngs.

End points

Get image file

endpoint: GET /teamspaces/{teamspace}/projects/{project}/drawings/{drawing}/revisions/{revision}/files/image description: Get the processed image of the 2d drawing permissions: viewer+ response: binary of the svg/jpg/png

NOTE: The image file could be svg or any other image file format - so it's important to make sure we get the mimeType correct to communicate to the frontend what file format this is.

Get drawing thumbnail

endpoint: GET /teamspaces/{teamspace}/projects/{project}/drawings/{drawing}/revisions/{revision}/files/thumbnail description: Get the processed thumbnail of the 2d drawing permissions: viewer+ response: binary of the png/jpg

NOTE: The image should be the processed, small version of the drawing that will satisfy the resolution required for the drawing list (check figma)

Processing Drawings

Drawing processing follow a similar process to 3D files processing, apart from the PDF will be handled by the NodeJS web service directly.

flowchart TD

    subgraph Queue Service
        Rabbitmq(callbackq)
    end
    subgraph API Service
        B{File type?}
        route[Upload revision endpoint]
        C[Extract the 1st page of the PDF]    
        PDFSuccess{Success?}
    end

    subgraph Model Processing Service
        D[Place task in queue]
    end

    subgraph bouncerworker
    onTask[New task from queue]
    bouncerProcess(Convert DWG to SVG)    
    bouncerComplete{Success?}    
    end

    User((User)) -->|Upload New Revision| route
    B -->|DWG| D
    route --> B
    B -->|PDF| C
    C --> PDFSuccess
    PDFSuccess -->|Status: ok| Rabbitmq        
    D --> onTask[New task from queue]
    onTask --> bouncerProcess
    bouncerComplete -->|Status: failed| Rabbitmq
    bouncerComplete -->|Status: ok| Rabbitmq
    bouncerProcess --> bouncerComplete
    C -->|Status: processing| Rabbitmq
    D -->|Status: queued| Rabbitmq
    bouncerProcess --> |Status: processing| Rabbitmq
PDF processing

PDF processing should happen in NodeJS instead of putting it to the queue and have it serviced by 3drepobouncer. This is due to the abundance of pdf reading libraries available in NodeJS that will allow us to process a file at ease. There are many libraries available that does slightly different things so there will need to be a technical analysis on which one best suit our needs. Found the following from a quick google:

We will only be dealing with the first page of the PDF, and ideally we want to convert that into an SVG. Failing so, it should be a raster image (e.g. png, jpg)

Failling all that, the drawing should be considered failed to process and return the error with an appropriate reasoning to the user.

NOTE: a failed revision should also be recorded, with the original file stored (So we get closer to https://github.com/3drepo/3D-Repo-Product-Team/issues/446)

See data storage section to determine where files and references should be stored.

DWG processing

DWG processing will require the ability to read DWG, which lies within 3drepobouncer. This should follow a similar process to 3d models where the task is submitted to the queue for bouncer workers to pick up.

Slight variation is that instead of putting the drawing file to the common sharespace, ideally we want to first commit the revision record (noted as incomplete), and just give the revision ID as the reference. The boucner should pull the file from the fileshare instead. (this should hopefully get us closer to doing (https://github.com/3drepo/3D-Repo-Product-Team/issues/446)

See data storage section to determine where files and references should be stored.

I'm not sure if we should put this task in modelq, or jobq. This very much depends on the amount of memory and time it takes to process. @sebjf perhaps once we have an implementation we can run through a few files of various sizes and see what the memory/cpu utilisation is like?

Data storage
Revision records

Image reference should be sored as part of the revision record in drawings.history:

// Example drawing revision
{
    "_id" : LUUID("cad0c3fe-dd1d-4844-ad04-cfb75df26a63"),
    "project": LUUID("bfe0c3fe-dd1d-4844-ad04-cfb75df26a63"),
    "model": LUUID("aeb0c3fe-dd1d-4844-ad04-cfb75df26a63"),
    "author" : "carmen",
    "statusCode" : "S1",
    "revCode": "abc,
    "timestamp" : ISODate("2023-11-28T19:11:55.000Z"),
    "format": "dwg",
    "rFile" : [ 
        LUUID("cad0c3fe-dd1d-4844-ad04-cfb75df26a63")
    ],
    "image": {

        LUUID("cad0c3fe-dd1d-4844-ad04-cfb75df26a63")
    }

}

NOTE:

Image file

The processed image file should be stored as part of the revision and also be stored in drawings.history.ref

// drawings.history.ref
{
    "_id" : LUUID("0ac71fb2-a18d-4ef8-b5a8-b87879326300"),
    "type" : "fs",
    "link" : "197/204/f959eda6-3fde-436a-ba0e-b1aa777a413d",
    "size" : 2419670,
    "mimeType": "image/svg+xml", 
    "project": LUUID("bfe0c3fe-dd1d-4844-ad04-cfb75df26a63"),
    "model": LUUID("aeb0c3fe-dd1d-4844-ad04-cfb75df26a63"),
    "rev_id" : LUUID("cad0c3fe-dd1d-4844-ad04-cfb75df26a63"),
}

(sebjf changed rid to rev_id)

Goals

Tasks

tbd

sebjf commented 1 month ago

Hi @carmenfan,

Regarding the revision schema, is revCode the same thing as tag? (I am thinking about model and drawing revisions sharing a base class if possible.)

carmenfan commented 1 month ago

Hi @carmenfan,

Regarding the revision schema, is revCode the same thing as tag? (I am thinking about model and drawing revisions sharing a base class if possible.)

yes and no. It serves a similar purpose, but with a different validator. It can share the same base class I think but tag will never be stored for drawings

For the purpose of drawings I want to tweak the responsibility a bit (and something that I want the container to follow as well). I'm thinking that the backend should be creating the revision history entry, and bouncer is just updating it for whatever purpose (e.g. appending the reference to the image)

So the message coming in from the queue wil tell you the teamspoace/project/drawing id and revision id, you'd fetch the drawing by reading the revision entry that already exists. So it no longer become bouncer's responsibility for writing all these stuff.

sebjf commented 1 month ago

Hi @carmenfan, in the schema above rFile is an array. Are we expecting multiple drawings per revision, and if so, should the importer create one entry in image for each in rFile (under the same uuid?)

carmenfan commented 1 month ago

Hi @carmenfan, in the schema above rFile is an array. Are we expecting multiple drawings per revision, and if so, should the importer create one entry in image for each in rFile (under the same uuid?)

as it stands we'd only expect 1 drawing per upload so 1 image

I did contemplate about this, 3d container store it as an array (in anticipation of texture files or different file formats etc), and I decided to just copy so if in the future we want to do something similar, then we don't have to change the type in the schema.

sebjf commented 1 month ago

Hi @carmenfan, bouncer contains the definition REPO_NODE_REVISION_ID as "rev_id". Do you want a new definition for "rid" or shall we change the (ref) schema above to use "rev_id"?