MassBank / RMassBank

Playground for experiments on the official http://bioconductor.org/packages/devel/bioc/html/RMassBank.html
Other
12 stars 15 forks source link

Fix #339 #342

Closed tsufz closed 1 year ago

tsufz commented 1 year ago

Export and import replaced by readr versions.

tsufz commented 1 year ago

Sorry, fixes #339.

meowcat commented 1 year ago

hi,

1) great initiative to start with the refactoring! 2) I suggest to version at the fourth digit (3.11.1.2) for iterations in this repo, and to increase third digit only when pushing to Bioc... at least that's what I did. Other suggestions? 3) XCMS vignette fails now, it seems to have worked before. Any idea why?

tsufz commented 1 year ago

hi,

1. great initiative to start with the refactoring!

2. I suggest to version at the fourth digit (`3.11.1.2`) for iterations in this repo, and to increase third digit only when pushing to Bioc... at least that's what I did. Other suggestions?

hi,

1. great initiative to start with the refactoring!

2. I suggest to version at the fourth digit (`3.11.1.2`) for iterations in this repo, and to increase third digit only when pushing to Bioc... at least that's what I did. Other suggestions?

3. XCMS vignette fails now, it seems to have worked before. Any idea why?
  1. Thanks, came by on WIP :-).
  2. I am happy with this. I will change the version to 3.11.1.2 and push again.
  3. I can check this locally, maybe I find a reason.
tsufz commented 1 year ago

I need to check the code anyway again, somehow NAs are imputed in the records.

COMMENT: INTERNAL_ID 4113
CH$NAME: 
CH$COMPOUND_CLASS: NA
CH$FORMULA: NA
CH$EXACT_MASS: NA
CH$SMILES: NA
CH$IUPAC: NA
tsufz commented 1 year ago

@meowcat, I guess that the downstream functions expects a dataframe? But readr::read_csv imports a tibble(), hence, we need to transform the imported table?

tsufz commented 1 year ago

Well, for whatever reason loadinfolist imputes now NA to the imported table. This is the cause for the downstream problems. Will figure this out tomorrow...

tsufz commented 1 year ago

@meowcat I guess, now is all okay, so read for review.

tsufz commented 1 year ago

Okay, normal workflow fails, checked locally, something with the infolists due to refacorisation. However, requires update of RMassBankData as soon it works locally. In meanwhile, we could use the github package for testing.

meowcat commented 1 year ago

I think the most important thing we should do before moving this to Bioc is a regression test for record creation. Basically we store the results of processing the demo dataset, strip the DATE and MS$DATA_PROCESSING: WHOLE entries, and check whether the generated records are still identical.

tsufz commented 1 year ago

Okay, normal workflow fails, checked locally, something with the infolists due to refacorisation. However, requires update of RMassBankData as soon it works locally. In meanwhile, we could use the github package for testing.

@meowcat A sorry, the problem is that the BioC RMassBankData is not up-to-date (see latest update . I needed to change the data in the csvs to reflect changes in naming in RMassBank.

I suggested to change the source of RMassBank data package to the github main until my PR is running through and we can safely update the BioC package?

meowcat commented 1 year ago

Hm but in this case should we not provide the option to update old infolists? We are breaking backwards compatibility here. I would suggest that we keep "soft compatibility" by providing updateInfolist that rewrites the column name for old infolists.

tsufz commented 1 year ago

@meowcat, I agree with the breaking argument. But, I am looking forward. I suggest rewriting the infolist in the case an old infolist is used to the new format. The dots in the names are not good, but a result of an old school read.csv() or a wrong use of it (as @sneumann mentioned). Now is the time to change it.

tsufz commented 1 year ago

According to my suggestion, the internal infofile handling is forward compatible now. Old lists are transformed to the new field format (no ., but _. And the spooky first column is removed (as an artefact of an incorrect use of write.csv.)

tsufz commented 1 year ago

This makes gatherCCTS more flexible and uses the full capacity of data retrieved.

meowcat commented 1 year ago

I'm rerunning this job to see if that was a flaky fail

meowcat commented 1 year ago

Sorry for the delay. I'm trying to go through this. Right now there is an issue with getDTXCID because it needs the API key... Do we simply omit testing this?

tsufz commented 1 year ago

Sorry for the delay. I'm trying to go through this. Right now there is an issue with getDTXCID because it needs the API key... Do we simply omit testing this?

@meowcat, yes, all the CCTE requests require the API key. However, very useful to get DTXCID/DTXSID and the recent CASNR as they maintain it.

tsufz commented 1 year ago

@meowcat, how to edit the code to remove the checks? Just deleting the run chunk?

meowcat commented 1 year ago

@tsufz, sent a PR to your dev branch that \dontrun{}s the CCTE code

meowcat commented 1 year ago

OK, went through it. Nice work I added one more fix to the workflow (in the PR to your dev branch) to skip the CCTE lookup if no API key is present. Then let's look if the CI goes through...

meowcat commented 1 year ago

A comment, I saw that we now use both httr and httr2, that seems unnecessary. Doesn't need solving today but should eventually be evened out

tsufz commented 1 year ago

A comment, I saw that we now use both httr and httr2, that seems unnecessary. Doesn't need solving today but should eventually be evened out

I agree, but needs some reformatting. I will add an issue and make a note in the wiki for the coding practise.

meowcat commented 1 year ago

We're getting closer: https://github.com/MassBank/RMassBank/actions/runs/5997364100/job/16263609142?pr=342#step:21:73

tsufz commented 1 year ago

@meowcat, I dunno what is still the issue. Your fix was properly merged in my fork...

meowcat commented 1 year ago

All is fine now, right? Then: Merge!