bio-tools / biotoolsRegistry

biotoolsregistry : discovery portal for bioinformatics
GNU General Public License v3.0
70 stars 20 forks source link

Collection ids #384

Closed hansioan closed 5 years ago

hansioan commented 5 years ago

@joncison

I found some collection ids that have length > 50 which is not conforming to the schema 3.0.0 changes. Is this something we need to enforce, perhaps it can be 100.

Here is a list of all collections in bio.tools sorted by string length: Cell Line Integrated Molecular Authentication database and Identification tool https://github.com/waqasuddinkhan/MACARON-GenMed-LabEx MOdels for Data Analysis and Learning - MODAL Bioinformatics and Biostatistics Hub Pasteur Structural Mass Spectrometry and Proteomics USMI Cell Line Database and Analisys Tools Regulatory Sequence Analysis Tools (RSAT) SHOW - Structured HOmogeneities Watcher USMI Biological Resources Catalogues Bioinformatics and Biostatistics Hub http://galaxyapi.web.pasteur.fr RostLab tools, PredictProtein http://www.pubmedcentral.gov/ Medizinisches Proteom-Center http://cbs.dtu.dk/services Bologna Biocomputing Group Python Software Foundation micro-computed tomography Animal and Crop Genomics EBI Tools (ENA Tools) Plant Systems Biology blastTaxoAnalysis1.0 Developed RD-Connect ELIXIR Trainer Tools Developed_RD-Connect EMBOSS at EBI Tools Tel Aviv University BioMedBridges Tools Clustal-Omega_1.1.0 denbi-bioinfra.prot denbi-bioinfra-prot g:Profiler toolkit Masaryk University Odonoghuelab tools EBI Training Tools Bromberglab tools Europe PMC Tools newick-utils_1.6 taxoptimizer_1.1 Institut Pasteur Parkinson Tools njplot_20051109 ViennaRNA_1.8.4 ClustalW_2.0.12 CELLmicrocosmos Czech Republic Thornton Tools ELIXIR-ITA-CNR PredictProtein PredictPtotein de.NBI-biodata Common Disease Neuroconductor Rostlab tools UniProt Tools Ensembl Tools Goldman Tools Instruct CCP4 ELIXIR-Norway CloudBioLinux galaxyPasteur BioInfra.Prot Cytoscape app combinatorics visualization ChEMBL Tools BioCatalogue Bioconductor EMBOSS_6.3.1 blast_2.2.26 squizz_0.99b ProteoWizard Segway Suite Rare Disease BioInfra.Pro denbi-sysbio highlighting ChEBI Tools middle-down phylip_3.67 pdb-lib_1.0 de.NBI-BiGi KMUTT tools GEM-pasteur GEM Pasteur g:Profiler PDBe Tools mview_1.49 Debian Med NTNU tools GigaGalaxy denbi-gcbn tomography RD-connect Sequencing microbiome ELIXIR-CZ MoD Tools EBI Tools ELIXIR.BE ELIXIR-SI ELIXIR-NL BIGCAT-UM hmmer3.0 UiO tools Cytoscape CBU tools UiB tools BiB tools ELIXIR-ES Compomics West-Life JensenLab Elixir-EE hackseq17 ELIXIR-NO MBU AV ČR MetalWeb Instruct BioExcel CNB-CSIC WestLife ms-utils GO Tools GenOuest LCC NCBR Genomics JIBtools Bioconda BIG N2N SeqWare jvarkit SEQwiki IRB-BSC WurmLab imaging unipept BSC-IRB overlap CEITEC BIOCEV EMBOSS Mobyle de.NBI OpenMS HD-HuB JABAWS Galaxy DRCAT BioJS UGent BBMRI KNIME SeqAn Luigi RECON GATB CCP4 NBIC VLPB CBS BiGi NeLS SINA NMBU CRAN CBS VIB UiT NMC CWL MuG INB PSB ARB

hansioan commented 5 years ago

@joncison BTW the new collectionID regexp doesn't validate the collection Animal and Crop Genomics We need to have a look at this asap.

joncison commented 5 years ago

Size limit of 50 should be enforced, gets crazy otherwise.

"Animal and Crop Genomics" validates just fine against <xs:pattern value="[\p{Zs}A-Za-z0-9+\.,\-_:;()]*"/> - am I missing something?

hansioan commented 5 years ago

@joncison Fixed the collectionID regexp on bio.tools side If we keep the 50 characters limit we have to remove these collections that are longer. Very weird that they used to be longer than 50 and in the new schema they are not.

joncison commented 5 years ago

Refactored the longer (invalid) collection ... so we can close this in due course.