bitwarden / server

Bitwarden infrastructure/backend (API, database, Docker, etc).
https://bitwarden.com
Other
15.14k stars 1.26k forks source link

PostgreSQL as alternate database provider #453

Closed kspearrin closed 1 year ago

kspearrin commented 5 years ago

Adding another database provider should be fairly straight forward. Bitwarden's data access layer is abstracted away with repository interfaces which can be found under src/Core/Repositories. I've already done the work needed for making this possible. All that is left is re-writing all of the SQL tables, functions, stored procedures, etc over to PostgreSQL and then wiring them up repository implementations for PostgreSQL. I've started this work in the following commits, which can be followed as a model going forward:

https://github.com/bitwarden/server/commit/84800da1fb046ab2da0ad14f6096f9ac04f60405 https://github.com/bitwarden/server/commit/acef40ee82e81ac81d4b17de09a2c174a3b7bfbb

Choosing a database provider is as simple as either providing a connection string for MSSQL or PostgreSQL, whichever you prefer.

I do not have a lot of experience in working with PostgreSQL, so I will need some help to complete this task.

Questions:

jeremyVignelles commented 5 years ago

I was just trying to setup bitwarden on my server and immediately saw ram usage getting high due to the use of SQL Server. I was wondering if there was a way to use another db provider, when I found this post.

I've never used PostgreSQL, but I'm curious to know if that would help my server support the extra load.

Why are you using stored procedures for? What other dbms-specfic feature are you using? Would SQLite be a candidate for small environments?

tamaralo commented 5 years ago

Hi there, long time lurker, first time poster. As somebody who is highly interested in this feature I would like to help with at least my two cents.:

Are PostgreSQL functions equivalent to MSSQL stored procedures?

PosgreSQL stored procedures capabilities surpass MSSQL's, as this was one of the key aspects of its development. For a short info/comparison: https://stackoverflow.com/questions/339744/better-languages-than-sql-for-stored-procedures

I've never used PostgreSQL, but I'm curious to know if that would help my server support the extra load.

What would be your server's setup? Win or *nix? Usually I would say yes, it should reduce some of the load, but I don't want to give you some false hope.

I do not have a lot of experience in working with PostgreSQL, so I will need some help to complete this task.

I never helped I any open source project out of fear of doing something wrong, but if you guys need some help and are willing to help guide me on how to contribute I would gladly help.

jeremyVignelles commented 5 years ago

I never helped I any open source project out of fear of doing something wrong, but if you guys need some help and are willing to help guide me on how to contribute I would gladly help.

You can't mess up a project on your own by doing bad things. If you have the right development skills, go ahead, the worst that could happen is spending too much time, but you'll always learn something in the process 😉

kspearrin commented 5 years ago

@tamaralo All database interaction in Bitwarden with MSSQL is done with stored procedures. You can find them (all all other schema) listed here: https://github.com/bitwarden/server/tree/master/src/Sql/dbo

My assumption is that we'd do the same in PostgreSQL using stored procedures (seems to behave differently in PostgreSQL) or functions. Our SQL data mapper (Dapper) seems to only work with PostgreSQL functions.

The majority of the work here is just translating the MSSQL schema (tables, indexes, sprocs/functions) over to their equivalent on PostgreSQL. After that it's just a matter of wiring them up to a C# repository class, which is fairly easy to do. Do you have enough experience in MSSQL and PostgreSQL to translate between the two? If so, I'd say you're qualified enough to help with this.

tamaralo commented 5 years ago

@jeremyVignelles 😊 I give it a try! I have a lot of fear/respect for opensource projects, as they seem to be so well organized and fast paced. ^^

@kspearrin: I worked with PostgreSQL, MySQL and some old versions from MSSQL ( in the 2000's ) so I think I could take a crack at it and try it at my server. The thing is, I study alongside my work and ATMI have some finals so, if its ok, I will try to take a look at it by the end of next week. 👍

kspearrin commented 5 years ago

@tamaralo If you have any questions when you get started, you can find me in our chatroom here: https://gitter.im/bitwarden/Lobby

tamaralo commented 5 years ago

@kspearrin Thx, I will come online by the end of next week (sorry, forgot the word next in the last post! Thanks for the prompt reply.

Kovah commented 5 years ago

The 2GB RAM requirement for SQLServer is the only thing holding me back from using Bitwarden. Would love to see an implementation for Postgres or any other database provider. 💙

indy-singh commented 5 years ago

Would love to help Bitwarden support PostgreSQL; though I tend to avoid stored procs/functions and prefer to bring that logic up into the code.

mrahbar commented 5 years ago

👍 For supporting PostgreSQL. I'm a big fan of Bitwarden and a happy paying customer. The only reason of not using a self-hosted Bitwarden is MSSQL.

On the topic tough I think that without any official endorsement of the Bitwarden team for PostgreSQL this endeavor is be doomed to failure. After the initial batch of SQL scripts is ported who is going to create new PostgreSQL scripts every time the equivalent MSSQL scripts are updated? Kudos for a defined data-abstraction-layer but I'm not optimistic when it gets to maintain two different database vendors.

So can this process be only an experiment or a real alternative solution for MSSQL?

FingerlessGlov3s commented 5 years ago

👍 This sounds like a good idea to me. Getting ram usage down and removing minimum 2 GB limit is good. Will make running it more resource efficient on virtual machines etc...

Although we do need support from Bitwarden themselves like @mrahbar says.

Agraphie commented 5 years ago

I would like to see this as well. Are there any experimental branches or any way I can help make this happen?

keliansb commented 5 years ago

Any news about this?

devployment commented 5 years ago

As briefly mentioned here I'll try to volunteer here. Will set things up locally and see where I hit roadblocks.

jeremyVignelles commented 5 years ago

@devployment : 👏 and good luck. Do you have any experience in working with Postgre SQL? Please keep us posted 🙂

devployment commented 5 years ago

Do you have any experience in working with Postgre SQL?

@jeremyVignelles, yes I do. It's been a while. But we'll see.

benjaminpreiss commented 5 years ago

Is anyone already working on translating the schemes? Are there any branches?

kspearrin commented 5 years ago

Not that I am aware of.

wusatosi commented 5 years ago

Do you have any experience in working with Postgre SQL?

@jeremyVignelles, yes I do. It's been a while. But we'll see.

Any update?

benjaminpreiss commented 5 years ago

I have started with translating but made close to no progress - but I also have very little experience with sql

Papina commented 5 years ago

Just read this while investigating postgreSQL. I have good experience with tsql to pl/pgsql so I might start to give this a crack.

Papina commented 4 years ago

im down to the nitty gritty now of functions. Im not super familiar with docker, but i cant seem to publish the mssql ports from the docker image successfully. Is there a switch i can add that will expose the 1433 port? when i add it it i get an error os OSX

i modified the function to publish port 1433:

function updateDatabase() {
    pullSetup
    docker run -i --rm --name setup -p 1433:1433 --network container:bitwarden-mssql \
        -v $OUTPUT_DIR:/bitwarden --env-file $ENV_DIR/uid.env bitwarden/setup:$COREVERSION \
        dotnet Setup.dll -update 1 -db 1 -os $OS -corev $COREVERSION -webv $WEBVERSION
    echo "Database update complete"
}

now i get the error:

MacBook-Pro:bitwarden benjamin$ ./bitwarden.sh updatedb
 _     _ _                         _
| |__ (_) |___      ____ _ _ __ __| | ___ _ __
| '_ \| | __\ \ /\ / / _` | '__/ _` |/ _ \ '_ \
| |_) | | |_ \ V  V / (_| | | | (_| |  __/ | | |
|_.__/|_|\__| \_/\_/ \__,_|_|  \__,_|\___|_| |_|

Open source password management solutions
Copyright 2015-2019, 8bit Solutions LLC
https://bitwarden.com, https://github.com/bitwarden

===================================================

Docker version 19.03.2, build 6a30dfc
docker-compose version 1.24.1, build 4667896b

1.32.0: Pulling from bitwarden/setup
Digest: sha256:e88f1611ff88c77a6255c49189ac3c965aaa3576fa6980ba54f2be10a96907b5
Status: Image is up to date for bitwarden/setup:1.32.0
docker.io/bitwarden/setup:1.32.0
docker: Error response from daemon: conflicting options: port publishing and the container type network mode.
See 'docker run --help'.

this is more for testing, so i can make sure the functions im rewriting actually produce the same data

kspearrin commented 4 years ago

You want to connect to the MSSQL instance of a Bitwarden installation? Just bash into the mssql docker container. Then you can run sqlcmd.

$ docker exec bitwarden-mssql bash

Papina commented 4 years ago

no, i want to connect to the mssql instance from the host, so i can run the development tools against the database directly, but still be able to use the bitwarden normally at the same time.

Papina commented 4 years ago

Screen Shot 2019-09-20 at 10 46 47

kspearrin commented 4 years ago

Ok. Then create the following file ./bwdata/docker/docker-compose.override.yml.

version: '3'

services:
  mssql:
    ports:
      - '1433:1433'

Restart. Now you can connect to localhost:1433 from tooling on the host.

Papina commented 4 years ago

awesome! got it working finally in dbeaver.

screenshot for those attempting the same, it needs a non-obvious driver, not the driver that is normally used for a windows SQL instance

Screen Shot 2019-09-20 at 12 28 10

I will continue with the conversions.

knocte commented 4 years ago

@Papina glad to see some progress; instead of asking for an update I'm going to chip in some funds with gitcoin. Hopefully other people interested in this can join me (adding more funds to it) to incentivise this work to be finished and merged to master.

gitcoinbot commented 4 years ago

Issue Status: 1. Open 2. Started 3. Submitted 4. Done


This issue now has a funding of 0.276 ETH (50.03 USD @ $181.26/ETH) attached to it.

gitcoinbot commented 4 years ago

Issue Status: 1. Open 2. Started 3. Submitted 4. Done


Work has been started.

These users each claimed they can complete the work by 1 year, 1 month ago. Please review their action plans below:

1) papina has started work.

1funtion per week until complete 2) ali8889 has started work.

برنامه ای عالی وباعث افتخار من هست که در این برنامه مشارکت میکنم وبه دانش ناچیزم اضافه میشه من دوست دارم که این برنامه به کار خود ادامه بده و به مخاطب خود سرویس بده . 3) ali8889 has started work.

برنامه ای عالی وباعث افتخار من هست که در این برنامه مشارکت میکنم وبه دانش ناچیزم اضافه میشه من دوست دارم که این برنامه به کار خود ادامه بده و به مخاطب خود سرویس بده . 4) alcaravanosu has started work.

I will study all the environment about Bitwarden and his tables and store procedure, to start translating from MSSQL to Postgres. I already start reading https://github.com/bitwarden/server/issues/453 and downloading de docker image to raise up a dev env. 5) yassefa24 has started work.

First step is to be familiar with the back up process of the db in the source db, next is to do an ETL process of the data into the postgress db and finally is to verify the data. This is all in a nutshell but this is the process I would approach.

Learn more on the Gitcoin Issue Details page.

Papina commented 4 years ago

Just started a new job, so I have been busy, but I have a few functions translated, just haven't pushed the updates

FingerlessGlov3s commented 4 years ago

Whats the latest update on the conversion?

Papina commented 4 years ago

waiting on two pull requests before i continue further

https://github.com/bitwarden/server/pull/578 https://github.com/bitwarden/server/pull/580

knocte commented 4 years ago

@Papina PR #578 still has a typo which the maintainer is requesting to be addressed. And #580 needs to be rebased because it has conflicts. So it looks like what needs to be done is take action and not wait, or am I missing something?

Papina commented 4 years ago

spelling mistake is fixed, didnt see that review. the conflicts are updates to existing views, just adding in 'OR REPLACE' to various DDL

knocte commented 4 years ago

There are now conflicts in both PRs :)

Morl99 commented 4 years ago

Hello, I want to chime in and share my opinion that the current approach is futile and will leave the project in a state where maintenance is really hard.

I suggest that we introduce Liquibase to manage the DBMS Setup. It should also support writing DBMS agnostic Functions/Procedures [edit: no, it doesn't] . And the best part is that converting to Liquibase can be done tool based, see their documentation: https://www.liquibase.org/documentation/existing_project.html

I am willing to help with this, if the community and maintainers thinks that this is a good idea.

I am personally motivated because I want to run my own instance of Bitwarden, but mssql is a deal-breaker for my setup. I do have a lot of pssql experience (we run about 100 pssql instances at work, though we spin them up in AWS).

knocte commented 4 years ago

I'm no maintainer, but seeing the other approach seems to be getting no love, I'd go for it @Morl99, worst thing that can happen is that both strategies get committed and the one that has the least maintenance work survives.

Morl99 commented 4 years ago

I thought about it some more, and I fear that the approach is a good idea, but the tool might not be a good fit for a .NET environment, and it would be some work to bootstrap this at runtime (like adding a Java runtime to the docker container). I will look into similar tools for .NET. Or does anybody have any experience to share?

mnuccioarpae commented 4 years ago

I fear that there's no such thing as DBMS agnostic functions or procedures.

Morl99 commented 4 years ago

Yeah I arrived at the same conclusion. But it would be a help if the schema itself would be declared in a DBMS agnostic way.

Another option would be to reduce the amount of procedures and pull some of that functionality into the application (using sql statements if necessary).

I am specifically looking for recommendations for a dbms agnostic schema creation/update tool that plays well in a dot.net environment.

knocte commented 4 years ago

Another option would be to reduce the amount of procedures and pull some of that functionality into the application (using sql statements if necessary).

I bet the best approach for that would be using EntityFramework or some other ORM, but I also bet that the maintainer of bitwarden would not be happy about this drastic change.

Morl99 commented 4 years ago

Let us just ask 😉 @kspearrin do you have an objection against EntityFramework? I had used it in my Bachelors Thesis, and tough the development flow was really nice. But it is a long time ago and the project was a lot smaller than Bitwarden. It would surely be challenge to convert, but I think it could also be a lot of fun.

kspearrin commented 4 years ago

@Morl99 Yes, I am not a fan of ORMs like EF. Too much “auto magic” code generation. Stored procedures are always preferred.

kspearrin commented 4 years ago

@Morl99 we could still consider EF as the base for other alternative database repository implementations, however. It’s just that we cannot use EF for our Sql server production environment which has a much larger scale than the on-prem installations people here are considering using Bitwarden for. So in the end we could have sql server repositories and EF repository implementations.

jeremyVignelles commented 4 years ago

In my experience, EF slows down the startup of application because of the huge reflection usage. Is that still true in 2019?

Yes, using EF could save a lot of dev time in providers implementation, but it adds a dependency that might not be wanted, and adds an overhead to the application.

jeremyVignelles commented 4 years ago

May I suggest the use of Dapper? It has been created with perf in mind at stackoverflow, and it seems database-agnostic.

knocte commented 4 years ago

IMO slow startup is not an issue for server apps because it just start once, and is running most of the time. Also IME Dapper is worse than EF (more weakly typed) but I don't want to start a flamewar here.

jeremyVignelles commented 4 years ago

It's not about starting a flamewar, we're just discussing alternatives. I don't have any decision power either.

After posting, I realized that Dapper might not be a good fit because you still need to write your SQL queries for anything more complex than a SELECT, so you're going to write DB-specific queries anyway.

You may be right, EF startup time might not be an issue for server-side applications, and I'm pretty sure that things got better since EF core.

EDIT: wouldn't that require to rewrite all our models though, to put annotations like foreign keys etc...?

mnuccioarpae commented 4 years ago

I agree that stored procedures are better. The problem I have is with the comments in the stored procedures, which do not explain the code.

For example in https://github.com/bitwarden/server/blob/master/src/Sql/dbo/Stored%20Procedures/User_DeleteById.sql there's no comment explaining why ciphers are deleted in batches of 100 records and why this is done in a separate transaction. What happens if the second transaction fails? The user remains, but his ciphers are gone? If yes, why this is desired behaviour?

I am pretty sure that all the T-SQL procedures can be translated in an equivalent PostgreSQL procedure (expecially if we require latest PostgreSQL), but it's hard to do without some help.

Maybe an automated tool like one of the ones listed at https://wiki.postgresql.org/wiki/Microsoft_SQL_Server_to_PostgreSQL_Migration_by_Ian_Harding can be helpful at least for the first translation?