Open bencomp opened 2 months ago
Interesting -- a lot of those dependencies are regulated by global constants set in the initializer, as we need to control access to the features that depend on them in the UI. I wonder if it would be possible to move some of those to bundler groups, then load the groups based on the constants?
@benwbrum I just edited my first comment to clarify that it's more likely that capistrano
could be left out than ruby-openai
. Making OpenAI (or Bento or Transkribus or ...) optional would be interesting, but that kind of customisation can definitely wait. I do consider this issue to be about optimisation; I haven't tested if this indeed improves performance by reducing memory footprint and/or loading times.
Let's talk a little more about open source distributions. Right now we have a development process that goes like this:
Feature development happens in feature branches based off of development
feature a
+ feature b
+ feature c
-> development
Deployment happens in two different branches
development
-> fromthepage.com
for deployments on our SaaS; updated once or twice per week and deployed using capistranodevelopment
-> main
for deployments of open-source code other servers, updated a few times a year and deployed in a wide variety of ways ranging from containers to raw git pull
statements, and perhaps capistrano as well.Since there is code specific to the SaaS business that open source installations don't want, we have some commits which remove this from main
. So we could add code to remove some of these dependencies from main
, but the problem is that we don't know which dependencies are actually being used "in the wild". It's possible that the University of Melbourne is using capistrano
, even if Utrecht isn't.
Thoughts?
There is also the consideration of how huge of an impact does it make on the docker size and speed. We will also have to coordinate to open-source that we are doing this and they have to re-add the necessary gems back if they need it (which again as @benwbrum mentioned we still don't know fully which is which)
But I agree that we do need to identify which is necessary for each cases (development, test, opensource, etc)
Thanks for clarifying, @benwbrum! I would still like to differentiate between making FromThePage features optional (like OpenAI and Transkribus integrations, and SaaS-only aspects like redirecting the homepage to /landing
) and understanding which gems are used only in development or testing, which are used for deploying and which are only used in production on the server. I understand that there isn't always a clear distinction to be made and my knowledge of Ruby on Rails is still limited. I also don't want to interfere in your business :)
As @WillNigel23 says, coordinating with admins of other FromThePage instances is necessary, because it is likely that installation instructions may change.
This issue should be considered together with #4291 as I think that no gems need to be removed from the Gemfile, just grouped (except uglifier, as it is not used). It is very likely that other parts of the code need to be changed, to be more selective in the gems that certain parts of the code require
or when they require
gems, like in config/initializers/openai.rb
:
require 'openai'
if ENABLE_OPENAI
OpenAI.configure do |config|
config.access_token = OPENAI_ACCESS_TOKEN
end
end
Perhaps require 'openai'
could be moved to after if
? Note that I'm again using OpenAI integration as an example, which is an application feature, but it was the easiest example to find.
After #4291, I envision that installation instructions may include something like:
- If you do not use Capistrano to deploy, you can
bundle config set --local without capdeploy
beforebundle install
.- If you precompile assets on your local machine and copy them from the
public/
directory to the server, you canbundle config set --local without assets
on the server beforebundle install
.
These examples are based on an idea to put the capistrano-related gems in a :capdeploy
group and all gems needed for asset compilation in the :assets
group in the Gemfile.
Perhaps some of the SaaS features/specifics could be extracted from the shared codebase into separate gems, eventually, so that building and deploying the SaaS instead of the basic version is a matter of configuration to include the extras instead of applying specific commits to disable and/or remove SaaS specifics. Or the views as they are should be updated to render SaaS specifics only if ENV["SAAS"]
is set, for example. I'm probably oversimplifying here.
Good point on differentiating between product features and deployment-specific code.
In an ideal world, development
would contain only shared code, fromthepage.com
would contain only SaaS code, and main
would contain F/OSS installation-specific code. In that case, trial account creation and landing pages would live in fromthepage.com
, the container stuff would live in main
, and neither would be in development
. We tried something like this for a while, but discovered that Github's automated PR merge system kept merging fromthepage.com
into development
before merging development
into fromthepage.com
, so whenever we'd cut a new release on main
we'd discover all this SaaS-specific wording that was inappropriate. As a result, we merge on the command line whenever we create a new release on main
, and main
contains the F/OSS installation specific commits that remove the landing pages, etc. This is a pain, but since most of the development activity is between development
and fromthepage.com
, it's what we've optimized for.
So I think that it makes sense to conditionally load the OpenAI gem (and others) in development
, but some of the SaaS-specific extraction and F/OSS deployment code should live in main
.
Notifying other installations is also tricky -- we are friends with the folks running installations at University of Texas-Austin, University of Nebraska-Lincoln, and of course University of Leiden, and have corresponded briefly with folks at Utrecht and Melbourne, but we have no contact at all with the people running installations at Bolsano, Taipei, or Buenos Aires (and of course there are likely installations which we don't even know about.)
Technically: I'd be very happy to merge the conditionalization of require
if you find that it works.
Did you consider creating a script and/or patch file to perform the steps that make the code on the development branch ready for the main branch (or a release)? During the build of the container image I also run some steps to e.g. remove test code and fixtures and remove the newrelic_rpm
dependency from Gemfile and Gemfile.lock. Since I build from the development branch by default, I feel that a script (that lives in the same branch) could be helpful to transform the code into "production" code.
If this works, you could automatically keep main
in sync with development
if you wanted.
I tried to
bundle install
without the development and test groups of gems, but then the application would not start. I reported this in https://github.com/benwbrum/fromthepage/issues/4291#issuecomment-2326368286The OpenAI and Bento gems are bad examples, because the application itself relies on them. Capistrano and related gems are better examples, since they are used for deployment only (and I would not need them when I deploy the application in a Docker container).
The Bundler documentation on groups says:
1e8fa225c602d914c676df2534e1ac90df8d5090 was the latest change to this line in application.rb; it's not clear from the commit message or in-line comments why this was necessary. I think it should be possible to change the call so that fewer gems are
require
d at runtime.