3drepo / 3drepobouncer

A C++ library providing 3D Repo Scene Graph definition, repository management logic and manipulation logic. It is is essentially the refactored 3DRepoCore and (parts of) 3DRepoGUI
GNU Affero General Public License v3.0
29 stars 13 forks source link

Add Drawings support #687

Closed sebjf closed 3 weeks ago

sebjf commented 3 months ago

Description

Add a command to bouncer client to import a DWG as an SVG to be saved to the new drawings collection, and hook this up to the appropriate (or new) queue in worker.

Goals

Tasks

Related Resources

Product Ticket: https://github.com/3drepo/3D-Repo-Product-Team/issues/377 Control flow: https://github.com/3drepo/3drepo.io/issues/5006 .io Endpoints: https://github.com/3drepo/3drepo.io/issues/4983

sebjf commented 3 months ago

Hi @carmenfan , first question!

The way the SVG exporter works is that it is much like our exporter that runs on an ODA database object. So, the readFile methods of FileProcessorDgn or FileProcessorDwg would be very similar, except they'd call a different method (e.g. importSvg instead of importDgn after setting up the views etc.

Do you want me to refactor the existing classes so they have two methods, one for importing to models & another for importing to Svgs, or just create new classes for the svg import and have a small amount of duplicate code between them?

(My instinct is that this should be in the same class, even if we end up duplicating view setup etc to make slight changes - after all the class is named after the file type it receives, not the type it outputs...)

carmenfan commented 3 months ago

@sebjf agreed, same class because of the reasoning you said 😆

sebjf commented 2 months ago

Hi @carmenfan,

I have a version of this working now. It is set up as follows:

image

RepoManipulator has a new method, processDrawingRevision, which handles the import and update of the drawing revision node and ref nodes. A class, DrawingManager, interacts with the database to retrieve and update revision nodes. It is in this definition that the schema is implemented. DrawingImportManager marshals revised FileProcessorDwg or FileProcessorDgn instances, which use the ODA SVG exporter device to create an SVG from the source files. This class is only concerned with file processing; it takes in a filename and outputs bytes (& soon a calibration I expect).

DrawingImageInfo is used to communicate between the three. For now, this class just holds the drawing name and a vector of bytes for the svg contents. It is created by RepoManipulator and passed via DrawingImportManager to the file processors. It has analogies to both GeometryCollector and RepoScene but is not directly equivalent to either; the SVG exporter creates something that is far closer to what we write immediately to disk than RepoScene, and there is no intermediate step required unlike GeometryCollector. Therefore the same type is passed directly through to the file processors.

The file processors work by storing a pointer to the DrawingImageInfo as a member. In readFile(), if this is non-null, the drawing import code runs. Similarly, GeometryCollector instances can now be null, and if not, the model import code runs. (The motivation for this is that there's a lot common between the two, but the use of try/catch blocks and the explicit initialisation/deinitialisation of ODA resources make having separate functions difficult).

I now just need to polish it a bit, and had a few questions/things to check in regards to this! (These basically boil down to how much effort do you want to put into type safety/verbosity...)

(1) Are you happy with DrawingImageInfo being passed right through, or would you like an intermediary (or higher level) type for decoupling purposes?

(2) Currently the DrawingImageInfo member is added directly to FileProcessorDwg & FileProcessorDgn, do you want it in the FileProcessor base class?

(3) At the moment, we have three ref node schemas to consider:

  1. Legacy/Models
{
    "_id" : "18f92421-5810-48a2-aaf2-6904011f5ad8.repobundle",
    "type" : "fs",
    "link" : "130/207/0bafc0a6-2f1f-4ea0-bf3c-ae5c4704bf0e",
    "size" : 193824,
    "encoding" : "gzip"
}
  1. Drawing rFile
{
    "_id" : "0ac71fb2-a18d-4ef8-b5a8-b87879326300",
    "type" : "fs",
    "link" : "197/204/f959eda6-3fde-436a-ba0e-b1aa777a413d",
    "size" : 2419670,
    "name" : "corner.dgn",
}
  1. Drawing Image
// drawings.history.ref
{
    "_id" : "0ac71fb2-a18d-4ef8-b5a8-b87879326300",
    "type" : "fs",
    "link" : "197/204/f959eda6-3fde-436a-ba0e-b1aa777a413d",
    "size" : 2419670,
    "mimeType": "image/svg+xml", 
    "project": LUUID("bfe0c3fe-dd1d-4844-ad04-cfb75df26a63"),
    "model": LUUID("aeb0c3fe-dd1d-4844-ad04-cfb75df26a63"),
    "rev_id" : LUUID("cad0c3fe-dd1d-4844-ad04-cfb75df26a63"),
}

Currently I am passing RefNodes and using existing APIs (such as the metadata argument to makeRefNode) to implement the schema.

(a) Shall we continue like this, or would you prefer a subclass for each of the above? (b) If we do add new classes/pathways, should we change the key type from string to LUUID on (2) and (3)?

(4) For testing I have added a flag, drawing, to the import command of 3drepobouncerClient, which triggers the drawing revision processing, rather than the model import. Would you like a new command for processing drawings, or leave as an import option? (Not sure of the implications for the queues etc for either of these!)

(5) Currently the namespaces need a bit of cleaning up. There are two new namespaces, repo::manipulator::drawingutility and repo::manipulator::drawingconverter, however DrawingRevisionNode is in repo::core::model, as are the file processors.

Do you want a new namespace repo::core::drawing, for example to hold DrawingImageInfo, and for them to be parallel where possible, or for it/them to be perhaps a child of model?

carmenfan commented 2 months ago

1) I think it's fine base on what you're saying, it's basically representative of the file

2) I think it should be in the baseclass

3) a) continue like this is fine as long as it works for you. Though I don't think the Rfile should have an incomplete flag? that should reside with the revision node like 3d, and also I don't think it needs an extension? b) I would prefer LUUIDs where possible

4) I think it's better if we have a separate command as it does pretty different things and takes in similar file types

5) repo::core should be representative to what we store in 3drepo database. Considering DrawingImageInfo doesn't live as an entity I don't think it should live there, sam ewith file processor (it's a manipulator, it's doing things to convert data). not sure if we should have a drawingconverter namespace either, can you tell me which class are in which namespace?

sebjf commented 2 months ago

Hi @carmenfan, thanks!

3 a) I think I misread and it should be in the drawing revision, though it won't make any difference to bouncer anyway! I will leave it like this then for now until there is a need to change it. 3 b) The way I have it now is that the DrawingRevisionNode stores all its entries as LUUIDs (including rFile and image), and then RepoManipulator turns them into strings when it hands them to FileManager. I will leave it like this if we aren't introducing dedicated types, since that is as far into LUUIDs as it is possible to go without introducing overloads throughout the file management classes.

5 The new classes are:

repo::manipulator::drawingutility::DrawingImageInfo repo::manipulator::drawingutility::DrawingManager repo::manipulator::drawingconverter::DrawingImportManager repo::core::model::DrawingRevisionNode

sebjf commented 2 months ago

Also

and also I don't think it needs an extension?

This was in response to

We will record the file extension in the object instead of having to regex it out of the rFile string like in containers.

Did that mean something else?

EDIT: Ah I expect I misread and it meant in the Drawing Revision Node! I will update it...

carmenfan commented 2 months ago

Also

and also I don't think it needs an extension?

This was in response to

We will record the file extension in the object instead of having to regex it out of the rFile string like in containers.

Did that mean something else?

EDIT: Ah I expect I misread and it meant in the Drawing Revision Node! I will update it...

Again I'm referring to the revision node 😆 so something Christos will handle on the backend

carmenfan commented 2 months ago

repo::manipulator::drawingutility::DrawingImageInfo repo::manipulator::drawingutility::DrawingManager repo::manipulator::drawingconverter::DrawingImportManager repo::core::model::DrawingRevisionNode

I think DrawingImportManager should probably live in modelConverter, if we see the file as the 3d model, then svg is just an outcome of converting the model.

Similar with the 2 in drawing utility, I think it'll be simplier to just put them in modelUtility.

This is just my thinking without looking at the code so do let me know if you don't feel like it make sense!

sebjf commented 2 months ago

Hi @carmenfan,

I have made those changes and am now looking at bouncer worker. I remember us discussing different possible queue configurations.

Looking at the code, messageDecoder (which builds the process args list from the queue object) is common. The only thing specific to the model import in modelQueueHandler is the processInformation, which expects some members specific to the import arguments (user, model, etc). However, this object only ever ends up in processMonitor, which only seems to use the Rid member.

jobQueueHandler is written a little differently but actually does the same thing as modelQueueHandler; using messageDecorder, runBouncerCommand, and generating the return message in the same way. The only real difference is the wording of the error in the catch block.

I am wondering then,

  1. Since we aren't sure whether we will put drawings in the model queue or their own at this stage, shall I refactor modelQueueHandler into something like bouncerQueueHandler, that can handle all bouncer commands, so we have one type of handler that can be attached to multiple queues?

  2. If yes, shall I also replace jobQueueHandler with bouncerQueueHandler, since we are working on it?

carmenfan commented 2 months ago

@sebjf prior to all that, can we run some stats to determine whether we want to use either of the queue or want to setup a new queue for drawings?

I think this all depends on how cpu/memory/time intensive it is going to be

sebjf commented 2 months ago

Some initial results for import times.

These figures were taken with the Visual Studio Performance Profiler in Memory Mode. Therefore the processing times are inflated, but memory should be quite accurate.

Drawing Source File Size (Kb) Time (s) Memory (Mb)
Drawing10 1989 2 54
Drawing3 2885 4 65
Drawing4 3248 18 963
Drawing9 6051 7 135
Drawing6 37420 24 910
Drawing1 47971 31 945
Drawing5 60660 16 681
Drawing8 61524 9 570
Drawing2 62615 16 1015
Drawing11 141186 88 3800
Drawing7 141621 87 4200

The sample drawings are real drawings taken from a couple of different projects. The source file size is used as an approximation of 'complexity', though as expected the use of references & instances does affect the resources used.

Overall, the memory peaks for even the largest drawings are a fraction of those of the models, but they are not exactly tiny.

Anecdotally, the import process has two peaks: one reading the file, which for the largest drawings was about 1.6 Gb, and a second when writing the SVG, which depending on the drawing can be much larger.