Nozbe / WatermelonDB

🍉 Reactive & asynchronous database for powerful React and React Native apps ⚡️
https://watermelondb.dev
MIT License
10.49k stars 589 forks source link

Manual insert of rows is not working when retrieving data. Numbers get lost #980

Open ruben4mobile opened 3 years ago

ruben4mobile commented 3 years ago

I am trying to insert a large data set of over 188.000 records into my DB . I uses the batch operation with prepareCreate. But that turned out to be too slow.

Code looked like this. (*but this a another table that had no issues, because of a small loop)

yield database.action(async () => {
    const collection = database.collections.get(QuestionnaireTable.name);
    const newRecords = questionnaires.map((quest) =>
      //@ts-ignore the extends Model
      collection.prepareCreate((questionnaire: Questionnaire) => {
        questionnaire._raw.id = `${quest.id}-${quest.version}-${locationId}`;
        questionnaire.title = quest.title;
        questionnaire.questionnaireId = quest.id;
        questionnaire.questionnaireVersion = quest.version;
        questionnaire.locationId = locationId;
        questionnaire.data = quest.sections;

        fillQuestionFields(questionnaire, quest.sections);
      })
    );
    await database.batch(...newRecords);
  });

So I thought I probably can make the insert faster if I can write my own insert query. So I ended up with this:

import { sanitizedRaw } from '@nozbe/watermelondb/RawRecord';

  const newParts = analysisPart.map((part: any) => {
    const analysisPart = sanitizedRaw({}, AnalysisPartsTable);
    analysisPart.analysis_part_id = part.id.toString();
    analysisPart.name = part.name;
    analysisPart.type = part.type;
    analysisPart.analysis_type = analysisPartType.valueOf();  // This is an enum and should be stored as number
    analysisPart.questionnaire_id = `${part.questionnaire_id}-${part.questionnaire_version}-${locationId}`;
    analysisPart.analysis_id = analysisId;

    let result = '(';
    const values = Object.values(analysisPart);
    let value;
    for (let i = 0; i < values.length; i++) {
      value = values[i];
      result += typeof value == 'number' ? value : `'${value}'`;
      result += i + 1 < values.length ? ',' : ')';
    }
    return result;
  });

  if (newParts.length > 0) {
    const insertParts = `insert into ${AnalysisPartsTable.name} ('${Object.keys(
      sanitizedRaw({}, AnalysisPartsTable)
    ).join("','")}') values ${newParts.join()}`;
    yield rawQuery(AnalysisPartsTable.name, insertParts);
  }

Resulting in a query like this

query insert into analysisParts ('id','_status','_changed','analysis_part_id','guid','component_id','object_id','name','order','type','analysis_type','questionnaire_id','analysis_id') values ('x1zgkur5fela5wbm','created','','1','null','null','null','Algemene gegevens',0,'general_data',1,'1_1_0_1-6-10230','10206')

With this being the rawQuery function

export const rawQuery = async (
  tableName: string,
  sqlQuery: string
): Promise<any[]> => {
  try {
    const tag = adapter._tag;
    const bridge = NativeModules.DatabaseBridge;
    const dirtyRecords = await bridge.query(tag, tableName, sqlQuery);
    const rawRecords = sanitizeQueryResult(
      dirtyRecords,
      schema.tables[tableName]
    );
    const records = database.collections
      .get(tableName)
      ._cache.recordsFromQueryResult(rawRecords);
    return records;
  } catch (err) {
    console.log('error rawQuery:', err);
    throw Error(err.message);
  }

In my case I don't have the need for a result because I am only using this rawQuery to delete and insert data. The deletion of the 188.000 records was simply way to slow.

Now the issue is that although the data get inserted the types seems to get as pointed out here https://github.com/Nozbe/WatermelonDB/issues/965

So how can I write such a query that it will work when fetching the data with watermelon? Or how to do it the right way with prepareCreate that does perform. It was taking 2 minutes to insert! instead of the less than 10 seconds with the manual query.

Im using the latest version of watermelonDB: 0.21.0

radex commented 3 years ago

Your insert query is very unsafe - please read up on SQL Injection.

As for performance, try this: https://github.com/Nozbe/WatermelonDB/blob/master/src/Collection/index.js#L113 -- admittedly, it won't perform as well as a highly optimized raw query, but it should be much faster than normal prepareCreate

ruben4mobile commented 3 years ago

Thanks for the quick resonse @radex Do you have an example on how that prepareCreateFromDirtyRaw works? I quess you still have to call it for every entry? But how does the model and raw data correspond in order for this to work?

And is it possible with a raw query, because it looks like all field are stored as blobs that watermelon is turning into number/sring/boolean so something tells me that would not be possible.

radex commented 3 years ago

Do you have an example on how that prepareCreateFromDirtyRaw works?

https://github.com/Nozbe/WatermelonDB/search?q=prepareCreateFromDirtyRaw

I quess you still have to call it for every entry?

yes, but with a POJO, not a Model object

because it looks like all field are stored as blobs

that's not true. https://www.sqlite.org/datatype3.html - SQLite does not have statically typed fields. They're not blobs, they're untyped, so on read they're the same type as what was written to them

ruben4mobile commented 3 years ago

PrepareCreateFromDirtyRaw is not available in type definition.

ruben4mobile commented 3 years ago

The PrepareCreateFromDirtyRaw is not performant enough. Would it be possible to create the tables with its type. Because from the https://www.sqlite.org/datatype3.html I gather that is does support types as a hint of how to resolve storing data in it. But If i look at my tabel definitions in flipper it doesn't have types.

CREATE TABLE "analysisParts" (
  "id" primary key,
  "_changed",
  "_status",
  "analysis_part_id",
  "guid",
  "component_id",
  "object_id",
  "name",
  "order",
  "type",
  "analysis_type",
  "questionnaire_id",
  "analysis_id"
)

But the schema we create does have them. So we could say that sting is TEXT, number: NUMERIC and boolean is NUMERIC or INTEGER

stale[bot] commented 2 years ago

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.