[Bug] Request from Pulsar not handled appropriately

Daeraxa commented 1 year ago

See https://discord.com/channels/992103415163396136/992110062992621598/1046929553261613066 for initial confusion and wild assumptions.

Honestly not sure if this is for PPM or the backend but assume the backend because I imagine the same situation will arise if used for Atom.

Basically it seems that, whilst the backend technically isn't doing anything wrong, it isn't the expected response.

To replicate:

In "Install packages" click the "Hydrogen" package to go to package details
Pulsar will error:

Trying to install leads to a similar problem:

(ppm has similar responses, see the discord link at the top for @Spiker985's results).

It seems that Pulsar is requesting "Hydrogen" and not "hydrogen". You can see this if you click the Hydrogen title in the package install screen, it tries to go to: https://web.pulsar-edit.dev/packages/Hydrogen which is a 404 and not https://web.pulsar-edit.dev/packages/hydrogen which is the actual package page.

Unsure if related but you can also see in the first image that the download count is missing.

From some very light digging it seems that in the API object it has name: hydrogen" but metadata.name: "Hydrogen. Not sure if this is just coincidence.

confused-Techie commented 1 year ago

Alright, so after some research with @Daeraxa on Discord, we have discovered the root cause of this problem.

The original Backend turns all package names to lower case, in every instance. Which we do not do.

Hydrogen was an edge case, since the name in their package.json has "Hydrogen", meaning there was a mismatch during some requests. Specifically when PPM went to collect the information from the package it used the following code:

requestSettings =
      url: "#{config.getAtomPackagesUrl()}/#{packageName}"
      json: true
      retries: 4

Meaning that PPM assumes it's fine to take the name field from the package.json and use it to request for a package.

But currently the backend matches case and spelling exactly when requesting a package.

With this new insight into how the backend originally worked, we will need to ensure that all endpoints will convert any Package Name present to lowercase.

We also need to ensure that when a package is published it will also be converted to lowercase before being entered into the Database. @Digitalone1 just making sure you're aware of this issue.

As for our priority of getting this issue resolved in production I'll get started as soon as I can on resolving the issue seen here. And further steps will need to be taken to ensure that only lowercase names are actually published to the database.

Thanks a ton for this issue, glad we were able to find the root cause on this one.

confused-Techie commented 1 year ago

A quick update, for viewing purposes this issue is now fixed. I've done an emergency update to the backend.

WARNING

This issue still needs to be resolved for all other actions with packages. This is only fixing for viewing purposes of packages.

Knowing that I'll leave this issue up until it is

Daeraxa commented 1 year ago

Just to chime in that it does appear to be working just fine now, link, details and install. One warning, if anyone tried to access or install Hydrogen it might still show errors until the electron cache gets cleared, I did it manually by renaming ~/.config/Pulsar,

Digitalone1 commented 1 year ago

So, I was thinking about this. In addition of what I already said in #88, I wonder why why need decodeURIComponent on the name.

Isn't the name already decoded before it's passed to the package handlers like other parameters?

Besides, we need to unsure which is the correct name format used by the backed and force it with a regex.

I'll start with the following: /^[\.\-a-z]+/$

And also add it as a constrain on names column in the db.

Do you know if characters beyond the latin format are allowed by the beckend (or github repo)?

Digitalone1 commented 1 year ago

So in the PR I added other adjustments to ensure also the proper semver format.

For the database consistency we have to:

[ ] find out a format telling the allowed characters for the package names
[ ] enforce this format adding a constrain on packages.name and names.name columns
[ ] enforce the semver format adding a constrain on versions.semver column

I will find out the postgresql syntax to do add a constrain on a column.

Digitalone1 commented 1 year ago

Oh, another thing:

// now with all the versions properly filled, we lastly just need the release data.
    newPack.releases = {
      latest: repoTag[0].name.replace("v", ""),
    };

How do we know repoTag[0] is the latest? They could not be sorted. We need to compare to check the latest as already did in removePackageVersion.

Spiker985 commented 1 year ago

How do we know repoTag[0] is the latest? They could not be sorted. We need to compare to check the latest as already did in removePackageVersion.

Excellent find

confused-Techie commented 1 year ago

@Digitalone1 the latest key, yes a fantastic find. We will have to compare these values directly to determine what truly would be the most recent semver. Now depending on the object that we have, I wonder if we can just properly ask the DB obj what the latest tag is. But I'm unsure if that snippet is during return object creation or during an object creation for publishing.

But otherwise for the names:

For allowed characters, what you have seems about right, otherwise I would look at the requirements for npmJS name field as they will likely have the best documentation for that. In the regex you could add the character limit if you'd like, but otherwise something basic like that seems good. But I'm honestly not sure if we need to be super strict on names of the packages other than the limits we have in place.

Since we already don't allow over a certain length, and are already allowing any mix of case. But as for using decodeURIComponent really came from a place of concern since the original Atom backend had many names that were not URL safe which was a pain initially. And while the query parameters are URI decoded I wasn't sure if the path itself was, since the name is usually coming from a path and not from the query parameters.

And last response to add, while we are trying to lock things down, if the db can handle it, I honestly don't see why not support latin characters? They don't even need to be supported by GitHub because the name is only within the package.json. I'd like to be able to support as much as possible, but don't want to make this an extra headache for ourselves

Digitalone1 commented 1 year ago

I wonder if we can just properly ask the DB obj what the latest tag is. But I'm unsure if that snippet is during return object creation or during an object creation for publishing.

It was from the json repo, but it can be done also for the database. Sorting the semver is tricky, but it could be done with the SQL using the virtual columns.

And last response to add, while we are trying to lock things down, if the db can handle it, I honestly don't see why not support latin characters? They don't even need to be supported by GitHub because the name is only within the package.json. I'd like to be able to support as much as possible, but don't want to make this an extra headache for ourselves

If we don't want restrictions, we could also support uppercase characters, but having a case insensitive search. It's interesting that you used ILIKE which is case insensitive, so I wonder why was it not working?

Digitalone1 commented 1 year ago

I'm reading the regex documentation of PostgreSQL: https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-POSIX-REGEXP

If we don't want restrictions, we could end up using regex, but they lead to other headaches because we should cover much more cases. There's also the issue of matching -, _ and . which are special characters that we have to escape.

Considering that the pattern are made from the string provided by the user, there's also the issue of having SQL injection leading to hard and complicated regex pattern which could affect the performance.

So I don't think regex could be a solution. A possible option that I had in mind is to not restrict to lowercase, neither blocking -, _ and . characters, but create a virtual column made by the lowercase of names which is used to search with the simple LIKE (no regex) which is faster and safer. If you agree on that path, I should read how to do virtual columns on PostgreSQL (already did on my db on MariaDB).

confused-Techie commented 1 year ago

Alright, so I believe most issues here have been taken care of and we are good to close. Please point out if that's incorrect:

Lowercase: The DB now restricts names to lowercase, as well as any endpoints handling the name will convert to lowercase, and so will the publish functionality.
Finding the latest semver: Handled as needed in #89

Thanks to everyone that helped out on this issue

confused-Techie / atom-backend

[Bug] Request from Pulsar not handled appropriately #86

WARNING