SwiftPackageIndex / SwiftPackageIndex-Server

The Swift Package Index is the place to find Swift packages!
https://swiftpackageindex.com
Apache License 2.0
559 stars 47 forks source link

Source Code Available Flag? #1025

Closed Sherlouk closed 2 years ago

Sherlouk commented 3 years ago

I wonder whether it would be possible (good?) to display whether or not a package exclusively uses a binaryTarget or not?

What I mean by this is that there are some Swift packages (example) who's sole purpose are to (re)distribute and make available precompiled xcframeworks to SPM. In these scenarios, there is no source code.

daveverwer commented 3 years ago

Would a slightly more generic way to reveal this information be to allow target information to be inspected? There are so few packages that are binary only that this maybe feels like it's too specific a feature for me.

Sherlouk commented 3 years ago

Is it a specific feature? Yes, but does it have a great impact in my decision making about whether or not to use a package? Also yes.

I also feel like the reason so few exist currently is because they're predominantly used by closed source companies who are still using CocoaPods and have not moved over to the new world yet

Whether this is available as it's own simple flag, or visible through some other format (such as a more generalised way to inspect the target information), I'm not as fussed so happy to change the subject of this but thought I'd add some more context 👍

finestructure commented 3 years ago

I'd probably turn this around and mark the exceptions as "binary only".

samdeane commented 2 years ago

As a first stab are you imagining an additional icon on the information page for a package - in this area?

image

(assuming that we're talking about just a "binary only" flag and not something that lets you inspect target info.)

daveverwer commented 2 years ago

Thanks for investigating this @samdeane!

(assuming that we're talking about just a "binary only" flag and not something that lets you inspect target info.)

Yep! I never clarified but @Sherlouk made good points and while we may do target inspection one day, this task is just for a "no source code available" or "binary only" warning, which is useful.

As a first stab are you imagining an additional icon on the information page for a package - in this area?

Yes. Absolutely. I think it might fit well as a full-width item towards the top of the metadata. Possibly even just under the archivedListItem as it’s an important thing to understand about a package before you use it.

I found it useful to think about the UI that we might have for this as part of thinking about it and even though I :+1:’ed @finestructure’s suggestion of “binary only”, that may slightly obscure the issue to people who are not paying full attention (like me, most of the time) and I think we need to explicitly mention that there’s no source code, too.

Some preliminary ideas for wording:

“This package does not include source code. There may be more information available on why in the [README] or [LICENSE] file.”

With [README] being a link to the #readme anchor and [LICENSE] being a link to the license file (which we already have in the page).

Or…

“This package only contains binary targets, meaning that source code may not be available. There may be more information available on why in the [README] or [LICENSE] file.”

I prefer the second one as it does give the possibility of them including source code but just not as a target in the package. We don’t know, so we probably shouldn’t be so definitive with our statement.

daveverwer commented 2 years ago

I mocked it up and played with the copy in the UI a bit. It's a bit wordy, covering three lines.

Screenshot 2022-07-14 at 20 32 42@2x

This may be better:

Screenshot 2022-07-14 at 20 32 10@2x

Might even be worth bolding the "source code may not be available"

Screenshot 2022-07-14 at 20 37 58@2x
daveverwer commented 2 years ago

I'm not sure about the orange exclamation mark by the way. This is important to know but it's not necessarily a red (or orange) flag.

samdeane commented 2 years ago

I've done the UI changes required for this, but still need to do the code to actually feed into whether the package is binary-only.

Am I right in thinking that the basic test for this is to check the package to see if it only contains binaryTarget targets? How about if it also contains testTarget targets (but no plain target targets?)?

Looking at your Manifest struct, you currently don't seem to record the target type - so I guess that will need changing? Then the analysis stage will then need to read this value for all targets, work out whether the package as a whole is binary-only, and store that somewhere (in a new bool field on the Package entity?)

daveverwer commented 2 years ago

Am I right in thinking that the basic test for this is to check the package to see if it only contains binaryTarget targets? How about if it also contains testTarget targets (but no plain target targets?)?

Yea I think if it contains test targets but no "normal" targets, I'd still think of that as binary. I think, however, this will be rare. Here are some examples:

This next one is interesting. Two binary targets and a regular target. As a consumer of this feature, if my company had a policy of not using closed source packages, this would get past, but I don't think it should.

I wonder if the flag should be triggered by any binary target. I think it should. The wording will need an update but in some ways that makes the wording easier as we can say something along the lines of "not all source code for this package may be available"

daveverwer commented 2 years ago

Looking at your Manifest struct, you currently don't seem to record the target type - so I guess that will need changing? Then the analysis stage will then need to read this value for all targets, work out whether the package as a whole is binary-only, and store that somewhere (in a new bool field on the Package entity?)

Yes I don't think we pull any target information out right now. We have a few choices here, depending on how big your appetite is:

  1. We could perform all the logic in the analysis process and store a boolean flag based on whether there are any binary targets.
  2. We could grab all the target information and store it in a JSON field and then perform the logic as the page is being constructed. This would leave the door open for more features based on targets in the future.

Whichever you have the appetite for, it won't be on the Package entity. It will be on Version as it could change from version to version. We'll probably only ever use the information from the default branch "version", but we should process it for all of them.

finestructure commented 2 years ago

Quick note from me: Is it worth running a check how many packages would actually have the flag? It feels like this can turn into quite a tricky thing to figure out and then add into analysis - but if it ends up affecting only half a dozen packages it might be easier to have them flag their package as binary in the SPI manifest file.

daveverwer commented 2 years ago

Quick note from me: Is it worth running a check how many packages would actually have the flag? It feels like this can turn into quite a tricky thing to figure out and then add into analysis - but if it ends up affecting only half a dozen packages it might be easier to have them flag their package as binary in the SPI manifest file.

I fear that if we make this into something package authors would need to opt-in to by manually adding a flag to the .spi.yml, no one will do it. The warning that this adds potentially dissuades people from using your package, so there's not much incentive for package authors to add it. We also know how hard it is to make people add/change that file, even if it's something that benefits the package!

There's an argument that the effort in altering analysis might not be worth it for this feature, but even if it only affects a few packages, I think it's a good indicator. I know of several companies that do have that policy of not using libraries where the source is not available.

samdeane commented 2 years ago

We could grab all the target information and store it in a JSON field

To do this, I think we'd have to extend Manifest.Target's definition to fully match SPM's model (or as much of it as we cared about). We'd then re-encode it as JSON and store that in the model.

This is because Manifest.Target is decoded as part of decoding Manifest - so we can't just grab the JSON that described it, we have to recreate it.

I don't mind doing that, but I'm wondering if we'd end up getting into compatibility problems later if the SPM target model changes? Also, it might be a bit of a YAGNI situation.

Perhaps a compromise is to just add type to Manifest.Target for now, but to still re-encode it and store it in the model as a JSON field.

That way we're not over-complicating things now, but there's room for expanding the target info that is stored in the model later, without having to add further fields.

daveverwer commented 2 years ago

Perhaps a compromise is to just add type to Manifest.Target for now, but to still re-encode it and store it in the model as a JSON field.

This sounds perfect to me

samdeane commented 2 years ago

That way we're not over-complicating things now, but there's room for expanding the target info that is stored in the model later, without having to add further fields. This sounds perfect to me

Great. Which also means as you said earlier that the logic for this particular feature doesn't have to live in the analysis code.

samdeane commented 2 years ago

I'm happy to give that all a stab. Do you make piecemeal migrations when adding fields, or do you just brute-force rebuild the database since it's all generated anyway?

samdeane commented 2 years ago

(I've done Vapor stuff before so I know how that side of things works - ish... ;))

daveverwer commented 2 years ago

Whichever you have the appetite for, it won't be on the Package entity. It will be on Version as it could change from version to version. We'll probably only ever use the information from the default branch "version", but we should process it for all of them.

I made a mistake here. I should have looked it up before speaking! 😅

The data will go on the products table in the type field. From a quick look in the current database, we may not be bringing binary targets in at all, but I'll leave you to investigate that. Happy to look at it in more detail if you have questions, though! Here's a DISTINCT on all the types we have.

Screenshot 2022-07-15 at 18 39 31@2x

Apologies for the mistake!

daveverwer commented 2 years ago

I'm happy to give that all a stab. Do you make piecemeal migrations when adding fields, or do you just brute-force rebuild the database since it's all generated anyway?

We make migrations. While the data is all generated, that generation would take weeks of processing time (all the builds) so we never reset the database. There are migrations in Sources/App/Migrations you can use to get started.

In terms of backfilling the data, if that's going to be necessary we have a process we can kick off to re-analyse all versions. We'll figure out if that's necessary as we get closer to merging and we will handle kicking that off.

daveverwer commented 2 years ago

Whichever you have the appetite for, it won't be on the Package entity. It will be on Version as it could change from version to version. We'll probably only ever use the information from the default branch "version", but we should process it for all of them.

I made a mistake here. I should have looked it up before speaking! 😅

I'm wrong again! It's been a busy day 🙄

Products and targets are totally different things, and I was right with my initial take. This will be a JSON field on versions.

samdeane commented 2 years ago

I'm wrong again!

😁 I’m glad it’s not just me that this happens to (with my own code).

daveverwer commented 2 years ago

I'm wrong again!

😁 I’m glad it’s not just me that this happens to (with my own code).

It gets worse, too! @finestructure reminded me this morning that we even have a targets table so it won't even be on versions, but a plain field on targets.

@finestructure also had a concern about the cost of pulling in another table to the join, so it may be worth checking the increase in that query cost before proceeding too much further. I can help with how to grab the current query that it's running, if you'd like.

samdeane commented 2 years ago

Have switched to the "has binary targets" perspective for now, and updated the names accordingly.

I wonder if the wording should be "Some targets in this package do not include source code."?

samdeane commented 2 years ago

@finestructure also had a concern about the cost of pulling in another table to the join, so it may be worth checking the increase in that query cost before proceeding too much further. I can help with how to grab the current query that it's running, if you'd like.

No idea how to do that, so yes, that would be good thanks.

Is the performance concern at the point of reading the data - eg when showing the page for the package? I presume that the versions table is already hit? In which case could we add a has_binary_targets flag to Version after all (instead of, or as well as, adding the encoded target properties to Target)?

daveverwer commented 2 years ago

I wonder if the wording should be "Some targets in this package do not include source code."?

This is great for now. We'll probably end up tweaking the final wording once it's all done. The only thing I'd like to make sure we do is to use the word "may" in there somewhere. It's quite possible (not logical, but possible) that a package has a binaryTarget and then also has source somewhere else in the repo.

This is a "One of these things is not like the others" check, an indicator that you need to look carefully to know what you're getting into and "may" gets us out of a lot of trouble. 😂

samdeane commented 2 years ago

Yeah, makes sense. I'll leave it for now - as you say it makes more sense to tweak it when the feature is finished and you can be sure what it actually does :).

samdeane commented 2 years ago

In which case could we add a has_binary_targets flag to Version after all

Obviously that would then push some of the processing back to the analysis phase.

daveverwer commented 2 years ago

@finestructure also had a concern about the cost of pulling in another table to the join, so it may be worth checking the increase in that query cost before proceeding too much further. I can help with how to grab the current query that it's running, if you'd like.

No idea how to do that, so yes, that would be good thanks.

Actually, @finestructure just recently changed this. Up-to-date instructions on logging queries are in #1896. I'll also add an issue to put that documentation into the LOCAL_DEVELOPMENT guide.

Is the performance concern at the point of reading the data - eg when showing the page for the package? I presume that the versions table is already hit? In which case could we add a has_binary_targets flag to Version after all (instead of, or as well as, adding the encoded target properties to Target)?

Yea, and if it's bad, then that's what we'll do. However, I'd rather keep the data and process it as we need it over having very specific fields if we don't need to. It's another join, but should only pull in a couple more records per package. Of course, the scans for those records may be slow, but also might not be!

samdeane commented 2 years ago

Up-to-date instructions on logging queries are in https://github.com/SwiftPackageIndex/SwiftPackageIndex-Server/pull/1896.

Cool - will take a look.

I'd rather keep the data and process it as we need it over having very specific fields if we don't need to

👍

samdeane commented 2 years ago

so it may be worth checking the increase in that query cost before proceeding too much further

So I've figured out how to turn on the query logging, but I'm not sure how to measure/profile its performance. Do you have something built in for this kind of situation, or is there a tool/trick to apply? (As you can probably gather, I am no SQL ninja.)

samdeane commented 2 years ago

As discussed in Discord, I think I might have crafted an approximate query to use to test performance, however...

So I think either we'd need to fetch the array of targets in a separate query and process them, or we might want to consider adopting a different approach for this particular problem (eg having a single flag in Version).

finestructure commented 2 years ago

I believe this is all done and can be closed?

daveverwer commented 2 years ago

Yes. The last part is to change the icon colour but that's tracked in a different issue