Closed Sherlouk closed 2 years ago
Would a slightly more generic way to reveal this information be to allow target information to be inspected? There are so few packages that are binary only that this maybe feels like it's too specific a feature for me.
Is it a specific feature? Yes, but does it have a great impact in my decision making about whether or not to use a package? Also yes.
I also feel like the reason so few exist currently is because they're predominantly used by closed source companies who are still using CocoaPods and have not moved over to the new world yet
Whether this is available as it's own simple flag, or visible through some other format (such as a more generalised way to inspect the target information), I'm not as fussed so happy to change the subject of this but thought I'd add some more context 👍
I'd probably turn this around and mark the exceptions as "binary only".
As a first stab are you imagining an additional icon on the information page for a package - in this area?
(assuming that we're talking about just a "binary only" flag and not something that lets you inspect target info.)
Thanks for investigating this @samdeane!
(assuming that we're talking about just a "binary only" flag and not something that lets you inspect target info.)
Yep! I never clarified but @Sherlouk made good points and while we may do target inspection one day, this task is just for a "no source code available" or "binary only" warning, which is useful.
As a first stab are you imagining an additional icon on the information page for a package - in this area?
Yes. Absolutely. I think it might fit well as a full-width item towards the top of the metadata. Possibly even just under the archivedListItem
as it’s an important thing to understand about a package before you use it.
I found it useful to think about the UI that we might have for this as part of thinking about it and even though I :+1:’ed @finestructure’s suggestion of “binary only”, that may slightly obscure the issue to people who are not paying full attention (like me, most of the time) and I think we need to explicitly mention that there’s no source code, too.
Some preliminary ideas for wording:
“This package does not include source code. There may be more information available on why in the [README] or [LICENSE] file.”
With [README] being a link to the #readme anchor and [LICENSE] being a link to the license file (which we already have in the page).
Or…
“This package only contains binary targets, meaning that source code may not be available. There may be more information available on why in the [README] or [LICENSE] file.”
I prefer the second one as it does give the possibility of them including source code but just not as a target in the package. We don’t know, so we probably shouldn’t be so definitive with our statement.
I mocked it up and played with the copy in the UI a bit. It's a bit wordy, covering three lines.
This may be better:
Might even be worth bolding the "source code may not be available"
I'm not sure about the orange exclamation mark by the way. This is important to know but it's not necessarily a red (or orange) flag.
I've done the UI changes required for this, but still need to do the code to actually feed into whether the package is binary-only.
Am I right in thinking that the basic test for this is to check the package to see if it only contains binaryTarget
targets? How about if it also contains testTarget
targets (but no plain target
targets?)?
Looking at your Manifest struct, you currently don't seem to record the target type - so I guess that will need changing? Then the analysis stage will then need to read this value for all targets, work out whether the package as a whole is binary-only, and store that somewhere (in a new bool field on the Package entity?)
Am I right in thinking that the basic test for this is to check the package to see if it only contains
binaryTarget
targets? How about if it also containstestTarget
targets (but no plaintarget
targets?)?
Yea I think if it contains test targets but no "normal" targets, I'd still think of that as binary. I think, however, this will be rare. Here are some examples:
This next one is interesting. Two binary targets and a regular target. As a consumer of this feature, if my company had a policy of not using closed source packages, this would get past, but I don't think it should.
I wonder if the flag should be triggered by any binary target. I think it should. The wording will need an update but in some ways that makes the wording easier as we can say something along the lines of "not all source code for this package may be available"
Looking at your Manifest struct, you currently don't seem to record the target type - so I guess that will need changing? Then the analysis stage will then need to read this value for all targets, work out whether the package as a whole is binary-only, and store that somewhere (in a new bool field on the Package entity?)
Yes I don't think we pull any target information out right now. We have a few choices here, depending on how big your appetite is:
Whichever you have the appetite for, it won't be on the Package entity. It will be on Version as it could change from version to version. We'll probably only ever use the information from the default branch "version", but we should process it for all of them.
Quick note from me: Is it worth running a check how many packages would actually have the flag? It feels like this can turn into quite a tricky thing to figure out and then add into analysis - but if it ends up affecting only half a dozen packages it might be easier to have them flag their package as binary in the SPI manifest file.
Quick note from me: Is it worth running a check how many packages would actually have the flag? It feels like this can turn into quite a tricky thing to figure out and then add into analysis - but if it ends up affecting only half a dozen packages it might be easier to have them flag their package as binary in the SPI manifest file.
I fear that if we make this into something package authors would need to opt-in to by manually adding a flag to the .spi.yml
, no one will do it. The warning that this adds potentially dissuades people from using your package, so there's not much incentive for package authors to add it. We also know how hard it is to make people add/change that file, even if it's something that benefits the package!
There's an argument that the effort in altering analysis might not be worth it for this feature, but even if it only affects a few packages, I think it's a good indicator. I know of several companies that do have that policy of not using libraries where the source is not available.
We could grab all the target information and store it in a JSON field
To do this, I think we'd have to extend Manifest.Target
's definition to fully match SPM's model (or as much of it as we cared about). We'd then re-encode it as JSON and store that in the model.
This is because Manifest.Target
is decoded as part of decoding Manifest
- so we can't just grab the JSON that described it, we have to recreate it.
I don't mind doing that, but I'm wondering if we'd end up getting into compatibility problems later if the SPM target model changes? Also, it might be a bit of a YAGNI situation.
Perhaps a compromise is to just add type
to Manifest.Target
for now, but to still re-encode it and store it in the model as a JSON field.
That way we're not over-complicating things now, but there's room for expanding the target info that is stored in the model later, without having to add further fields.
Perhaps a compromise is to just add
type
toManifest.Target
for now, but to still re-encode it and store it in the model as a JSON field.
This sounds perfect to me
That way we're not over-complicating things now, but there's room for expanding the target info that is stored in the model later, without having to add further fields. This sounds perfect to me
Great. Which also means as you said earlier that the logic for this particular feature doesn't have to live in the analysis code.
I'm happy to give that all a stab. Do you make piecemeal migrations when adding fields, or do you just brute-force rebuild the database since it's all generated anyway?
(I've done Vapor stuff before so I know how that side of things works - ish... ;))
Whichever you have the appetite for, it won't be on the Package entity. It will be on Version as it could change from version to version. We'll probably only ever use the information from the default branch "version", but we should process it for all of them.
I made a mistake here. I should have looked it up before speaking! 😅
The data will go on the products
table in the type
field. From a quick look in the current database, we may not be bringing binary targets in at all, but I'll leave you to investigate that. Happy to look at it in more detail if you have questions, though! Here's a DISTINCT
on all the types we have.
Apologies for the mistake!
I'm happy to give that all a stab. Do you make piecemeal migrations when adding fields, or do you just brute-force rebuild the database since it's all generated anyway?
We make migrations. While the data is all generated, that generation would take weeks of processing time (all the builds) so we never reset the database. There are migrations in Sources/App/Migrations
you can use to get started.
In terms of backfilling the data, if that's going to be necessary we have a process we can kick off to re-analyse all versions. We'll figure out if that's necessary as we get closer to merging and we will handle kicking that off.
Whichever you have the appetite for, it won't be on the Package entity. It will be on Version as it could change from version to version. We'll probably only ever use the information from the default branch "version", but we should process it for all of them.
I made a mistake here. I should have looked it up before speaking! 😅
I'm wrong again! It's been a busy day 🙄
Products and targets are totally different things, and I was right with my initial take. This will be a JSON field on versions
.
I'm wrong again!
😁 I’m glad it’s not just me that this happens to (with my own code).
I'm wrong again!
😁 I’m glad it’s not just me that this happens to (with my own code).
It gets worse, too! @finestructure reminded me this morning that we even have a targets table so it won't even be on versions, but a plain field on targets.
@finestructure also had a concern about the cost of pulling in another table to the join, so it may be worth checking the increase in that query cost before proceeding too much further. I can help with how to grab the current query that it's running, if you'd like.
Have switched to the "has binary targets" perspective for now, and updated the names accordingly.
I wonder if the wording should be "Some targets in this package do not include source code."?
@finestructure also had a concern about the cost of pulling in another table to the join, so it may be worth checking the increase in that query cost before proceeding too much further. I can help with how to grab the current query that it's running, if you'd like.
No idea how to do that, so yes, that would be good thanks.
Is the performance concern at the point of reading the data - eg when showing the page for the package? I presume that the versions table is already hit? In which case could we add a has_binary_targets
flag to Version
after all (instead of, or as well as, adding the encoded target properties to Target
)?
I wonder if the wording should be "Some targets in this package do not include source code."?
This is great for now. We'll probably end up tweaking the final wording once it's all done. The only thing I'd like to make sure we do is to use the word "may" in there somewhere. It's quite possible (not logical, but possible) that a package has a binaryTarget and then also has source somewhere else in the repo.
This is a "One of these things is not like the others" check, an indicator that you need to look carefully to know what you're getting into and "may" gets us out of a lot of trouble. 😂
Yeah, makes sense. I'll leave it for now - as you say it makes more sense to tweak it when the feature is finished and you can be sure what it actually does :).
In which case could we add a has_binary_targets flag to Version after all
Obviously that would then push some of the processing back to the analysis phase.
@finestructure also had a concern about the cost of pulling in another table to the join, so it may be worth checking the increase in that query cost before proceeding too much further. I can help with how to grab the current query that it's running, if you'd like.
No idea how to do that, so yes, that would be good thanks.
Actually, @finestructure just recently changed this. Up-to-date instructions on logging queries are in #1896. I'll also add an issue to put that documentation into the LOCAL_DEVELOPMENT guide.
Is the performance concern at the point of reading the data - eg when showing the page for the package? I presume that the versions table is already hit? In which case could we add a
has_binary_targets
flag toVersion
after all (instead of, or as well as, adding the encoded target properties toTarget
)?
Yea, and if it's bad, then that's what we'll do. However, I'd rather keep the data and process it as we need it over having very specific fields if we don't need to. It's another join, but should only pull in a couple more records per package. Of course, the scans for those records may be slow, but also might not be!
Up-to-date instructions on logging queries are in https://github.com/SwiftPackageIndex/SwiftPackageIndex-Server/pull/1896.
Cool - will take a look.
I'd rather keep the data and process it as we need it over having very specific fields if we don't need to
👍
so it may be worth checking the increase in that query cost before proceeding too much further
So I've figured out how to turn on the query logging, but I'm not sure how to measure/profile its performance. Do you have something built in for this kind of situation, or is there a tool/trick to apply? (As you can probably gather, I am no SQL ninja.)
As discussed in Discord, I think I might have crafted an approximate query to use to test performance, however...
name
field instead I guesstype == binaryTarget
)So I think either we'd need to fetch the array of targets in a separate query and process them, or we might want to consider adopting a different approach for this particular problem (eg having a single flag in Version
).
I believe this is all done and can be closed?
Yes. The last part is to change the icon colour but that's tracked in a different issue
I wonder whether it would be possible (good?) to display whether or not a package exclusively uses a binaryTarget or not?
What I mean by this is that there are some Swift packages (example) who's sole purpose are to (re)distribute and make available precompiled xcframeworks to SPM. In these scenarios, there is no source code.