soci::RowSet support for social::Blob

schvarcz commented 2 years ago

Hello everyone.

First, thanks for this very useful library. We are really looking to use this library on our company due the support to several backends.

However, I see some room for improvement where I hope I can help.

On our case, we have heavily use the soci::blob. Right now, I am in a situation where I have to use it with soci::rowset. However, when I do something like m_rowset.get<blob>("m_blob"), soci returns the error:

/usr/local/include/soci/row.h:70:11: error: no matching function for call to ‘soci::blob::blob()’
         T ret;
           ^~~

I do understand the nature of the problem and why soci::blob needs to be initialised with soci::session. I also can see that similar issues were raised in the past, with the same nature.

Since I can't see another way to retrieve soci::blob objects in a bulk way, I had to dig into it.

I believe that we have 3 main points to check when we are talking about retrieving information with blob:

soci::rowset
soci::into
ORM

For the first case, the solution seems to rely direct on the core, since soci::rowset has a pointer to soci:statement; and soci::statement has a public pointer to soci::session.

If we can rearrange soci::rowset and soci::row so the soci::session pointer can reach the final class, we might solve the issue for the first case. (Ps: we also should be careful for the cases of soci::rowset<soci::blob>.

I would like to know what do you think about this solution. It seems to me that we are supposed to have a method to retrieve blob data in a bulk manner, since it is available to all other types and it seems to be a recurrent issue report here. If you agree with this implementation, I can work on that this week and submit a PR. Otherwise, I am open to other solutions.

Very likely I will dig into the case 2. in the near future.

Kind regards

schvarcz commented 2 years ago

Hello guys.

@vadz answered me on the PR:

Thanks for working on this!

I don't use rowset API myself, so I'm not too familiar with it and could be missing something here, but I'm a bit surprised by the need to modify the code not clearly related to rowsets to implement support for blobs there when they do work just fine with just use() and into() already (which I do use in my own code). It would be best if this could be avoided, i.e. if the backend files could remain unmodified.

Also, thank you for adding a test, but ideally the test should work for all backends, not just SQLite, and actually test that retrieving the blobs works, too.

Thanks again!

I will answer it here since the conversation seems to go beyond my initial expectations. I will explain what I mean.

First things first, thanks once again for your reply! It seems that everything is well connected inside the code, which is good, but sometimes takes a simple modification a long journey into the code. Which is good to learn the philosophy adopted here.

Yes, I want to implement these tests for the other backends. I need it working at very least on SQLite3 and Postgres (ideally on MySQL too). The remaining backends I will implement to keep the SOCI compatibility as it should. (See more on the "obs" section)

However, while I was reading the code related to blob, I realised that currently SOCI doesn't have a regular compatibility between different backends for this data type.

You said you are using use() and into() with blob, right? So I presume you are using or SQLite3 or Firebird, is that correct?

Currently, it seems that SQLite3 and Firebird are the only ones supporting use() with blob functions, as you can see in the tests for ORACLE and Postgres. (SOCI does not support blob for the others backends).

USE 1: SQLite3 and Firebird do use the function use() to save modifications as any other data type:

{ // For SQLite3 and Firebird
    blob b(sql);

    sql << "select img from soci_test where id = 7", into(b);
    CHECK(b.get_len() == 0);

    b.write(0, buf, sizeof(buf));
    CHECK(b.get_len() == sizeof(buf));

    b.append(buf, sizeof(buf));
    CHECK(b.get_len() == 2 * sizeof(buf));

    sql << "update soci_test set img=:blob where id = 7", use(b); // Save it on DB
}

a similar code can be seen here.

USE 2: While Postgres and ORACLE save the modifications on the database at the moment you call the function write.

{ // For Oracle and Postgres
    blob b(sql);

    sql << "select img from soci_test where id = 7", into(b);
    CHECK(b.get_len() == 0);

    b.write(0, buf, sizeof(buf)); // Save it on DB
    CHECK(b.get_len() == sizeof(buf));

    b.append(buf, sizeof(buf)); // Save it on DB
    CHECK(b.get_len() == 2 * sizeof(buf));
}

code copied from here.

Which forces the user to first create a dump empty blob inside the database with a SQL command:

{
    // Create a empty `blob` on `Postgres` 
    sql << "insert into soci_test(id, img) values(7, lo_creat(-1))";

    // Create a empty `blob` on `Oracle` 
    sql << "insert into soci_test (id, img) values (7, empty_blob())";
}

code copied from here and here.

Which doesn't make too much sense to me.... In order to insert a new information you have to run an insert, followed by a select, to only then insert your blob data.

The version implemented on SQLite3 and Firebird makes more sense to me. It is more similar to how we modify any other value on the database.

One could argue that blob data is saved as a string on SQLite3, where Postgres saves it in a file apart from the table itself. However, Firebird does a similar job as Postgres. Nevertheless, it is implemented as it is. Besides, imho, an ordinary user does not pay attention to these details on a regular use. Also, we could still provide a similar interface to an advanced user as it is today.

After digging the code, it seems to me that there is a confusion on the roles of class blob; and class blob_backend;. IMHO, the first should be a container into where the blob data navigates inside the code, similar to what class holder; does for the general use. The second one should be the backend functions to actually insert the data into the db. I believe this confusion came from the time when SOCI supported only Oracle and didn't use to need an abstract transport layer to support many backends. Just a guess.

USE 3: Also, for the other variable types, SOCI seems to support a "later evaluation" (or as you wanna call) to simplify the data input:

{
    statement st1 = (sql.prepare <<
        "insert into soci_test (id, name) values (:id, :name)",
        use(id), use(name));

    id = 1; name = "John"; st1.execute(1);
    id = 2; name = "Anna"; st1.execute(1);
    id = 3; name = "Mike"; st1.execute(1);
}

code copied from here.

Which currently will not work with blob data on any backend. I could try to implement something for this, but since it is not currently supported by SOCI, I will let it just as a comment for now.

Actually, the simple fact of re-using a blob class will yield errors:

{ // Another issue on Firebird (I believe that it doesn't affect SQLite3)
    blob b(sql);
    b.write(0, buf, sizeof(buf));
    CHECK(b.get_len() == sizeof(buf));
    sql << "insert into soci_test(id, img) values(1, :blob)", use(b); // Saves the blob on db.
    sql << "insert into soci_test(id, img) values(2, :blob)", use(b); // Saves an empty blob
}

I am adding this test on the tests batch and fixing where it is happening.

Only a suggestion: Talking about "later evaluation". It would be nice if we could have a blob function just to mark that a variable should be treated as blob by SOCI, and actually takes care for us of all necessary transformations.

{ // Only a suggestion
    std::vector<int> vec1 = {1, 2, 3, 4}

    statement st1 = (sql.prepare <<
        "insert into soci_test (id, m_list) values (:id, :blob)",
        use(id), use(blob(vec1)));

    id = 1; st1.execute(1);                    // Saves {1, 2, 3, 4}
    id = 2; vec1.push_back(5); st1.execute(1); // Saves {1, 2, 3, 4, 5}
}
{
    std::vector<int> vec2;
    sql << "select m_list from soci_test where id = 1", into(blob(vec2));

    print_vector(vec2); // Prints {1, 2, 3, 4}
}

I think it is doable, but it is out of scope of the discussion here and/or my initial motivation. I am letting it here just as a suggestion.

My initial goal was to support blob on rowset. So I can retrieve binary data on a table-like format, together with other informations, as I need for a current application we have here with SQLite3 that is migrating to Postgres and/or MySQL. I did have to change a thing or two to allow this use on Firebird and very likely I will need to change to allow on Oracle too. I am not going into details here, but the appropriated tests are going to be implemented on all backends too.

In short (or in tables). Current `blob` support:	Oracle	Firebird	Postgres	SQLite3	MySQL	DB2	ODBC
`into()`	✅	✅	✅	✅	🚫	🚫	🚫
`use()`	🚫	✅	🚫	✅	🚫	🚫	🚫
`use()` w/ later eval	🚫	🚫	🚫	🚫	🚫	🚫	🚫
`rowset()`	🚫	🚫	🚫	🚫	🚫	🚫	🚫

Where I am aiming:	Oracle	Firebird	Postgres	SQLite3	MySQL	DB2	ODBC
`into()`	✅	✅	✅	✅	❓	🚫	🚫
`use()`	✅	✅	✅	✅	❓	🚫	🚫
`use()` w/ later eval	🚫	🚫	🚫	🚫	🚫	🚫	🚫
`rowset()`	✅	✅	✅	✅	❓	🚫	🚫

OBS 1: If you wanna follow the CI tests, I am running them here. This repo has a good CI pipeline implemented for GitHub Actions, idk why it is not running on my PR here. So I forked the project to run it there. OBS 2: Just so you know, I wanna work on the final backend (Oracle) tonight or tomorrow. If everything goes fine, a PR will be ready to evaluation on Monday's morning.

vadz commented 2 years ago

Sorry, I didn't have time to look at this in details yet, but we do use BLOBs (or at least CLOBs, for XML storage, but I think they are very similar) with Firebird, Oracle and PostgreSQL, so I am not sure what the problem is. I don't think we ever reuse them, however, so there could be bugs when doing it.

Another question I have is what exactly do you mean by "good CI pipeline", i.e. how is it better than what we have? If you have any improvements to suggest for our CI workflow, please don't hesitate to make (another) PR for it, TIA!

schvarcz commented 2 years ago

I mean your CI pipeline, the GitHub actions one. For some reason, yours GitHub actions is not running on that PR, so I created a fork just to run yours GitHub Actions CI pipeline.

In my case here, we need a library that runs the exactly same code on different backends, which is not the case for blob right now. When you would have a time, please check it.

If you think it is not from SOCI's interest to normalize this use, I will keep that code on my fork and use from there.

vadz commented 2 years ago

The CI should run now, it just had to be manually authorized for the first run by a new contributor.

And of course all the backends should work the same, ideally. Anyhow, let's wait until your PR is finalized and discuss the final version when it's ready.

gtnardy commented 2 years ago

looking forward to this!

schvarcz commented 2 years ago

Hello Gabriel.

It is almost done for all engines. I am missing just oracle. But I work got me and I ended letting it aside for a time.

If you would like to help finish and/or would like to use to another backend, be my guest to checkout on my branch.

Kind regards

Krzmbrzl commented 10 months ago

I think all of the mentioned features here (except for the "tell soci to use arbitrary objects as blob-like data") are implemented in #992

zenggl commented 1 week ago

@schvarcz Have all these suggestions been implemented? Thanks

SOCI / soci

soci::RowSet support for social::Blob #922